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that the teaching and learning that goes on in a classroom is like an ordinary conversation. 
The speaker (teacher) compresses a non-linear knowledge structure (the target procedure) 
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knowledge structure (the learned procedure) from the utterance ^etiuence (lesson 
sequence); In recent years, linguists Irave discovered that spesikers /unknov^^ingly obey 
certain constraints on ihe sequential fomi of their utterances. Apparently, these tacit 
conventions, catled felicity conditions or conversational postulates, help listeners construct 
an appropriate knowledge structure from the utterance sequence. The analogy between 
conversations and classrooms suggests that there might be felicity conditions on lesson 
sequences that help students learn procedures. This research has shown thai there are. 
For the particular kind of skill acquisition studied here, three felicity conditions were 
discovered. They are the central hypotheses in the learning theory. The theory has been 
embedded in a model, a large computer program that uses artificial Intelligence (Al) 
techniques. The model's performance has been compared to data from several thousand 
Students learning ordinary mathematical procedures: subtracting multidigit numbers, adding 
fractions and solving simple algebraic equations. A key criterion for the theory is that the set 
of procedures that the model "learns" shoukJ exactly match the set of procedures that 
students actually acquire, including their "buggy" procedures. However, much more is need 
for psychological validation of this theory> or any complex Al-based theory> than merely 
testing Its predictions. Part of the research has involved finding ways to argue for the 
validity of the theory. 
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Abstract 

A tlicory of how people lif^ certain proccdura! skills is presented It is based on the Idea that the 
leaching and learning tliat goes on in a classroom is like an ordinary conversation. Ilie speaker 
(teacher) compresses a non linear knowledge structure (the ui^et procedure) into a linear sequence 
of utterances (lessons). The listener (student) constructs a knowledge structure (the learned 
procedure) from ihe^utteranee sequence (lesson sequence). In recent jcars, linguists have discovered 
that spciiicers unknowingly obey certain constraints on the sequential form of their utterances. 
Apparently, these tacit conventions, called felicity conditions or comers^tionaI postulates, help 
listeners construct an appropriate knowledge structure frorxi^the iittenince sequence. Hie analogy 
between converSiUions and classrooms suggests that there jnight be felicity conditions on Icsst)n 
sequences tliat help students learn procedures. This research has shown tJiat there are. For the 
particular kind of skill acquisition studied here, three fetictty conditions were discovered. 1liey are 
the central hypotheses in the learning dieory. "ITtc theory has been embedded in a model a la^ge 
computer program that uses artificial intelligence (AI) techniques. Ilie n^-odeVs performance has 
been compared to data from several thousand students learning ordinary matliematical procedures: 
subtracting multidigit numbers, adding fractions and solving simple algebraic equations. A Key 
criterion for the theory is that the set of procedures that the model 'learns" should exactly match 
the set 0^ procedures that students actually acquire, including their "buggy" piwedures. However, 
much more is need for psychological validation of tliis dieory, or any complex Af-based theory, 
than merely testing its predictions. Part of the research has involved finding ways to argue for the 
validity of the theory. , 
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! Chapter 1 
Objectives of the Research 



There arc two goals for the research presented here. One is psychologiopi and the other is 
methodological The psycttOlogical goal is to formulate and validate a theory of a certain'kind of ^ 
human learning. The methodological goal is to use anifieial intelligence (M) techniques to model 
that learning, and to do it in such a way that the complexity of the Al-based model does not 
pre^?ent the theory from meeting rigorous criteria of scientifie validity. The first section of this 
chapter discusses the psychological goal; the sccprfd section discusses the mctliodological one. The 
third section introduces *the gi^anization of the rest of the document^ 



hi The psyehologicai gpal: Step theory and repair theory 

One goal of this rescareh is a psychologically valid theory of how people learn certain 
procedural skills. There are .other Al-based tJieorics of skill acquisition (e,g,> Anderson* 1982; 
Newell & Roscnbloom> 1981)> However, their objectives differ from the ones pursued here, ITiey 
eoncentrate on knowledge compilation: the transformation of slow, stumbling performance info 
performance that is "faster and more judieious in choice" (Anderson> 1982> pg, 404), They sfudy 
skills that are taught in a simple way; first the cask is explained, then it is practiced until proficiency 
is attained^ For instance, Anzai and Simon (1979) moilelled a subject whose skill at, solving the 
Tower of Hanoi puzzle evolved ftom a slow, stumbling first attempt into an ability to solve the 
puzzle rapidly ubing the optimal sequence of moves, 'ITie subject received no instruction after the' 
initiai description of the puzzlers operations and objectives. The research presented here studies 
skills that are taught in a more complex way: the instruction is a lesson sequence* where each lesson 
consists of explanation and practice of some small piece (subskill) of the ultimate procedural skill. 
Studying multi^esson skill ^quisition shifts the central focus away from practice ef^ts (knowledge 
compilation) and towards a kind of student eognition that could be called knowledge integration: the 
construction of a procedural skill from lessons on i[s subskilK 

This study puts more emphasis on the teacher^s role than the knowledge compilation research 
does, \i is not the case that multi-lcsson skill acquisition occurs with just any lesson sequence. 
Rather, the lesson sequences are designed by the teacher to facilitate knowledge integration. 
Knowledge integration^ in turn* is "designed" to work only with certain kinda/ of lesson sequences. 
So, what is really being studied is a teaeherstudent system that has both cognitive and cultural 
roots. An equally appropriate name for the central focus of this rcjJearch is Jcmwledgc 
communication: the transmission of a procedural skill via lessons on its subsktUs, 

The skills chosen for the present investigation are ordinary, written mathematical calculations. 
The main advantage of mathemati^l procedures, from the experimenter's point of view> is that they 
are virtually meaningless for the learner. They seem as isolated nt)m common sense intuitions as 
. the nonsense syllables of eariy teaming research* In the case of the subtraction procedure, for 
example, most elementary school students have only a dim conception of its underiying semantics, 
which is rooted in the b^senen representation of numbers. When compared to the procedures they 
use to operate vending machines or play games, arithmetic procedrres are as dry, formal and 
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disa^niiccicd from cvcr>da> interests as nonsense syllables are differpnt fioin fe<*l words. Hiis 
k)lotiun is die bane or teachers, but a boon to psycholugistb. U allows psychologisis to study a skill 
tliat i^ much more complex tlian Recalling nonsense sylKiblcs, and yet it a\<Hdb biinging in a whole 
\\inrld's worth of arvsoctations. 

It is worth a tnoincnt to re\icw how mathematical pnxredurcs cire taught in <i typical American 
school. In tlie case of subtraction, there are about ten lessons in itb Icsbon sequence. I he lesson 
Sequence introduces the procedure incrementalii, one btep per lesson* s<> io speak- l*or instance, the 
first lesson might ^ihow how to do subtraction of two-column problems. The second lesson 
demonsij;iitc^^hrcc'CuUjmn problem solving, 'llie third introduces borri>wmg. <md so on. Hie ten 
lessons arc spread over aboui three years, starting in the late second grade (i.e. at about age seven)* 
(iTiese lessons are interieaved with rc\iew lessons and lessons on many otiier topics. In the 
classroom, a typical less^mTSsts an hour 1Ti>teacher soKcs some problems on the board with tlie 
class, Uien the students soKe problems QH^ei^xJwn. If Uiey need help, they ask the teachen or 
they refer to worked examples in die textbook. A textbook example consists of a sequence of 
captioned **sh3pshots** of a problem being solved. e.g., 

Take\ten to Subtract ^ ' Subtract 



make 10 ones. the ones. the tens. 

' 1 9 - 1 9 -19 

, ' 6 16 



Textbooks have very little text explaining the procedure (young children do not read well). 
Textbooks comain mostly examples- and exercises. 

Math bugs r3veal ihe learning process 

• \ 

Error data are used in testing the theory, ^fhere have been many empirical studies of the 
errors that students make in arithmetic (Buswell, 1926; liiucckncr, 1930; Brownelh 1941; Roberts, 
1968; Lankford, 1972; Cox. 1975; Ashlock. 1976). A common analytic notion is to separate 
Systematic errors /rom careless errors. Systematic errors appear to stem from consistent application 
of a faulty method, algorithm or rule* These errors occur along with the familiar unsystematic or 
''careless" errors (c-g.. a facts erron such as 7-3=5), or slips as I prefer to call them (c.f, Norman 
1981). Since slips occur in expert performance as well as student behavior, the common opinion is 
thai they are performance phenomena, an inherent part of the ''noise*' of the human information 
processor. Systematic errors on the other hand are taken as stemming from mistaken or missing 
knowledge about the skiJI, the product of incomplete or misguided learning. Only systematic errors 
are used in testing the present Uieory. 

Brown andj^Burton used the meLiphor of bugs in computer programs in developing a precise, 
detailed descriptive formalism for systematic errors (Brown & Burton, 1978). The basic idea is that 
a student's errors can be accurately ^produced by taking some formal representation of a correct 
procedure and maWng one or more small perturbations to \U e.g.. deleting a nile. Tlic 
perturbations are called bugs. A systematic error is represented by a set of one or more bugs in a 
correct algorithm for the skill. Bugs describe systematic errors with unprecedented precision. If a 
student makes no slips, then his or her answers on a test will be exactly matched by the buggy 
algorithm's answers, digit for digit To illustrate the notion of bugs, consider the^ following 
problems, which display a systematic error: 
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306 80 ■ 183 702 3005 7002 34 251 

138-4 - 95 - 11 - 28 - 239 - 14 - 47 



78 76 88 591 1087 4873 24 ^44 

.One could vaguely describe" these problems as eoming from a student h.iving trouble -Jtith 
borrovting. espcglat!> in the prescnee of zeros. More prccisc1>, the student misses the problems 
tliat require borrovting from /<:ro. One could sa> th<u the student has luit m.i^tcri:d the subskill of 
borrowing aeross zero. "I'his description of tl'jC systematic ern)r ib fme at one level: it ib a testable 
prediction about what new problems the student will get wrong. It predicts fur example that the 
student will miss 305-117 and will get 315-117 correct. Systematic errors described at this levcl arc 
the data upon which several psyehological and pedagogical theories have been built (e,g., Durnm & 
Scandura, 1977), 

Bugs go beyond describirig what kmds of exercises the student misses. I^hey describe the. 
actual answers given. The student whose work appears above has a bug called Borrow Ac/oss^Zero. 
A correct subtraction procedure has been perturbed b> deleting the step wherein the zero is 
changed to a nine during borrowirig across /.ero. This modification creates a procedure for 
answeriJig subtraction problems. As a hypothesis, it predicts not only which new problems the. 
student will miss> but also what the answers will be. For example, it predicts that the student above 
would answer 305-117=98 and 315-117 = 198. Since the bug-based descriptions of systematic errors 
prediet behavior at a finer level of detail than missing-subskill descriptions, they have the potcyitial 
to form a better basis for cognitive theories of learning and problem solving. Bug-bdSed analysis is 
used in testirig this theory. 

It is often the case that a student has more than one bug at the same time. Indeed, the 
example given above illustrates eo*occurrenee of bugs. The last two problems are answered 
incorrectly but tlie bug Borrow-Across-Zero does not predict their answers (it predicts the two 
problems would be answered correctly). A second bug. called DirF*N-N=;N, is present. When 
the student comes to subtract ^ column where the top and bottom digits are equal, mstead of 
writing zero in the answer, the student writes the digit that appears in the column. So the student 
has two bugs at once. \ 

m 

Burton developed an automated data analysis program, called Debuggy (Burton, 1981). Using 
it, data from thousands of students learning subtraction were analysed, and 76 different kinds of 
bugs were observed. Similar studies discovcred*68 bugs in addition of firactions (Shaw et. al., 1932), 
several dozen bugs in simple linear equation solving (Slccman, forthcoming), and 57 btjgs in 
addition and subtraction of signed numbers Cratsuoka & Baillie, 1982). 

ft is important to stress that bugs are only a notation for systematic errors and not an 
explanation. The connotations of "bugs" in the computer programming sense do not necessarily 
apply, fn particular, bugs in human procedures are unstable. They appear and disappear over 
^^short periods of time, often with no intervenir^ instruction, and sojnetimes even in the middle of a 
testirig session (VanLehn, 19S1; Bunderson, 1981). Often, one bug is replaced by another, a 
phenomenon called bug migration^ ^ 

Mysteries abound in the bug data. Why are there so many different bugs? What causes 
them? What'causes them to migrate or disappear? Why do certain bugs migrate only into certain 
other bu^s? Often a student has more than one bug at a time - why do certain bugs almost 
always occur together? Do eo-oceurring bugs have the same eause? Most importantly, how is the 
educational process involved in tiie development of bugs? 

This research was launched partly in order to explain iJie mysteries just mentioned. ITie goal 
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is to give a unified account of what causes students to ha\ j just the specific bugs tii^it they do have. 
As an illustration of the kind of explanations that the present theory offers, consider a eommon b^g 
among subtraction students: the student always borrows from ihcJeftmust column in the pmblem 
no inaiter which column oiiginaics the borrowing. Problem a below shows Uic correct placement of 
borrow*s decrement. Problem b shows the btig*s placement 



5 ^ . 2 -5 

a. 3 6^5 b. 3 6^5 c. 6*5 

"10 9 "10 9 -19 

256 166 46 



(The small numbers represent the students scratch marks.) ))ebuggy*s name for this bug is Always- 
Borrows^Ueft. It is moderately common: In a sample of 375 smdents with bugs, six students had 
this bug. It has been observed for years (c.C Buswell, 1926, pg, 173, bad habit number s27). 
However, no vnc lias offered an explanation for why students have it. The theory offers the 
following explanation, which is based on the hypothesis that students use induction (generalization 
of examples) to learn where to place the borrow^s decrement. Every subtraction curriculum Uiat I 
know of intRxluces borrowing using only two-eolumn problems* such as problem c above. Multi- 
column problems* sueh as a> are not used. Consequently* the student has insufficient information to 
unambiguously induce where to place borrow*s decrement. "ITie correct placement is in the left- 
adjacent column, as in a Howeven two-eolumn examples are also consistent with decrementing the. 
leftmost column, as in 6. If the student chooses the leftmost-column generalization, the student 
acquires Always* Borrow- Left rather than the correct procedure. According to this explanation* the 
cause of the bug is twofbld: (1) insufficiently variegated Instruction* and (2) an unlucky choice by 
the student 

The bugs that students exhibit are Important data for developing the theory. TTiesc bugs will 
be called observed hu^s. Equally important are bugs that students don*i exhibit When tliere are 
stfong 'reasons to believe that a bug will never occur, it is called a star bug (after the linguistic 
convention of placing a star before sentences that native speakers would never utter natur{illy). Star 
bugs* and star data in general are not as objectively atumable as ordinary data (VanLchn, Brown & 
Greeno, in press). But they are quite useful. To see this, consider again the students who are 
taught borrowing on two column problems, such as problem c above. In two-column problems, the 
borrow s decrement is always in the tens column* Hence **ieus column" Is an inductively valid 
description of where to decrement However, choosing "tens column** for the decrement's 
description predicts that the student would place the decrement in the tens column regardless of 
where the borrow originates. This leads to strange solution^ such as d and e below: 



5 15 

d. 1^5 6 5 e. 3^6 5 

" 9 10 - "19 0 

1 8 5 S ' 2 6 5 



To my knowledge, this kind of problem solving has never been observed. In the opinion of several 
expert diagnosticians, \t never will be observed* Always decrementing the tens column is a star b^g* 
The theory should na predict its occurrence. This has important implications for the theory. Ttte 
theory must explain why certain inductively valid abstfactions (e*g., leftmost column) are used by 
^students while Mtain other abstfaetions (e.g.> tens column) are not 

These examples have illustrated one side of the research problem: to understand certain 
aspects of skill acquisition (i.e*, knowledge integration/communication) by studying bug?, The next 
subsection is a brief discussion of the theory* It cOncentfatcs on the insights that have been 
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obtained into liow buggy procedures arc acquired; 

Step theory* reimir theory and felicity conditions 

Kor historical and other reasons, it is best to view the present theory as an integration of tv p 
theories. Step theory describes how students xquire procedures from instruction. Repair theory 
describes how students barge througih situations where their procedure has reached an imp*isse** 
llic two theories share the same representations of knowledge and much else. 1 will continue to 
refer to thetn together as "the theory/' 

Repair thmry is based on the insight that students do not treat procedures as hard and fast 
algorithms. If tliey are unsuccesslijl in an attempt to apply a procedure to a problem* they are not 
apt to just quit as a computer program docs. Instead* they will be inventive, invoking certain 
general purpose uctics to change their current process state in such a way that they can continue 
the procedure. These tactics !arc simple ones* such as skipping an operation that can^t be performed 
or backing up in the procedure and taking another path. Such local problem solving tactics are 
called repairs because they fijt the problem of being.stuck. lliey do not fix the imderiying cause of 
the impasse* Given a similar exercise later, the student will reach the same impasse. On this 
occasion, the student might apply a difTcrent repair, lliis shifting among repairs is one explanation 
of bug migration. A remarkable early success of lepair theory was predicting the existence of this 
kind of bug migration before the phenomenon was observed in the data. 

Step theory is based on the insight that classroom learning is like a conversation in that there 
are certain implicit conventional expectations, called felicity conditions, that facilitate information 
transmission. In this domain, the felicity conditions all seem to reflect a single basic idea: students 
expect that the teacher wiil introduce just one new "piece*" of the procedure per lesson, and that 
such "pieces" will be "bimple" in certain ways. Although students do not have surong expectations 
about what procedures will be taught they have surong expectations about how procedures will be 
taught ' Step theory takes its name from a slogan that expresses the students* expectations: 
procedures arc taught one simple step at a time. Several felicity conditions have been discovered, 
including: 

1* Siuaents ^txpecra^lesson to intfoducc^t^ost one new "piece*" of procedure that is, roughiy 
speaking, one disjunct of a disjunction* Such "pieces" are called subprocedures. This felicity 
condition will be described in more detail in a moment 

2. Students induce their new subprocedure ftom examples and exercises. That is, students 
expect the lesson*s material to correctly exemplify the lesson*s target subprocedure* 

3* The students expect the lesson to "show all the work" of the target subprocedure* This 
felicity condition, called the ihow-work principle, requires a little more explanation. Suppose 
a target subprocedure will ultimately involve holding some intermediate result mentally, as 
when solving 3+4+5, one holds the intermediate result 7 mentally. When this subprocedure 
is introduced, the showwork principle mandates that the }esson*s examples write the 
intermediate result down. In a later lesson, the students may be taught to omit the extra 
writing by holding the intermediate result mentally. 



* John Seely Brown originated repair theory (Brown & VanLehn, 1980). The present version 
remains true to the insights of the original version although most of the details have changed. 
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Austin (1962) invented felicity conditions as a uay of ^inaly/ing ordinary coiivers^itions. A typical 
linguistic felicity condition 's: In nonnal conversation, tlie speaker, uses a definite noun phrase only 
if the speaker believes the listener can unambiguously deterinine the noun phase's referent. 
Typically, neither the speaker nor the hearer is aware of such constraints. Yet, if a conversation 
violates a felicity condition, it is somehow marked. e.go by the speaker appe^tring s^ircastiL- or the 
hearer misunderstanding the speaker. Austin's idea has been dcveh)ped by Scaria (1969), Grice 
(1975), Gordon and l^kofF (1971). Cohen and Pcrrault (1979) and many othej's. It has become a 
whole new field, discourse analysis, "llie present work K to my knowledge, the first application of 
ideas from discourse analysis to the study of learning. Two key ideas have been imported: 

1. Felicity conditions are operative constratr^us on human behavior despite the fact that the 
participants are not aware of them. Textbook authors probably do not consciously realize that 
the lessons they write obey e.g.. tlie show-work principle. They strive only to make the 
lessons effective. So too, the speakers in a conversation try only to communicate effectively, 
and are not aware that they obey certain felicity conditions. 

2. llie seeming purpose of felicity conditions is to expedite communication. In particular, there 
seem to be certain inherent problems that the listener (student) must solve. For instance, 
whenever a speaker uses a noun phrase, the listener nmst decide whether jt refers to a 
previously mentioned object or to one that is new to the conversation. The felicity condition 
mentioned a moment ago helps the listener decide: If f say "the theory" right now, you will 
probably take it to mean the one presented in this paper. If f say "a theory," you will 
probably take it to mean a hitherto unmentioned theory that f will soon be telling you 

^something about. Felicity conditions expedite communication by helping the listener solve 
inherent' problems. Felicity conditions do not usually solve the inherent problem for the 
listener (student), but they do simplify the listener's task. An inherent problem for classroom 
Knowledge communication will be discussed in a momenta ^ 

Jther ideas from discourse analysis (e.g.« conversational implicature — the deliberate violation of a 
felicity condition in order to achieve a special effect) have not yet found analogs in the domain of 
classroom learning. It remains an open question just how far die analogy between conversation and 
multi-lesson knowledge communication will go. 

1.2 The methodological goal: competitive arguments for each hypothesis 

The rise of AI has given psychology the tools to build computer programs that apparendy 
simulate complex forms of cognidon, such as skill acquisidon, at a level of detail and precision diat 
is orders of magnitude greater than that achieved by earlier models of cognition. Unfortunately, the 
potential of Af models to explain human learning (or odier kinds of cognition) is largely unrealized 
due to methodological weaknesses. Until recendy, it was rare for a mode) tc be analyzed and 
explicated in terms of individual hypotheses. One was asked to accept the model in toto. Critics 
have pointed out that a typical Al/Simulation explanation" of intelligent behavior is to subsdtute 
ont black box, a complex computer program, for anodier, the human mind (Kaplan, 1981). Efforts 
at explicating ^. 'ograms have increased recendy. Although extracting the hypodieses behind die 
design of the model is a necessary first step, many other issues remain to be addressed: What ai'e 
die reladon^hips between die hypotheses and die behavior? Could die given cognition be simulated 
if the hypodieses were violated or replaced by somewhat different ones? Would such a chaoge 
produce inconsistency, or a plausible but as yet unobserved human behavior, or merely a minor 
perturbadon in the predicdons? Which altemadves if any can now be rejected in favor of die 
chosen hypotheses? The connccdon of explicit hypotheses to the data seems to me to be cridcal to 
progress in computational dieories of cognidon. The emphasis must be on the conneedon; 
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explication alone is on!y a beginning, 

Al has given psychology a new way of expressing models of eogiiition that is much more 
detailed and precise than its predecessors. Unfortunately, the incrc^ised det^iil and precision in 
siathg models hjs nut been jccomp^iii^ed by correspondingly detailed and precise iirigiiments 
analyzifig and supponing them. Consequently, the new. riehly detailed models of cognitive iscienee 
often fail to meet accepted criteria of scientifie theories. It is not new to point oiu that current 
theori/ing bu^d on eumpotatioii^l models of cognition has been lax in providing suuh support 
{Pyiyshyn. 1980; Fodor 1981). Perhaps what is new would be to give a complex Al-based theory 
proper support. Not only would this be interesting in it^^clf. but it would show thjt there is nothing 
inherent in Al-ba$ed models thai prevents their use in scientifically acceptable theories. 

The methodological goal of this research is to give an Al-based theory proper support. By 
support. I refer to various traditional fonriS of scientific rc.isoning such as showing that specified 
empirical phenomena provide positive or negative evidence regarding hypotheses* showing that an 
assumption is needed to maintain empirical content ano falsifi ability, or showing that an assumption 
has consequences that are contradictory or at least implausible. However, one form of support has 
turned out to be particularly useftji. f have found that tlic internal structure of the theory — the 
way the hypotheses interact to entail empirical coverage — comes out best when the theory is 
compared with other theories and vvith alternative versions of itself. ITiat is, a key to supporting \ 
this theory is competUive argumeniction. In prttttiee, most competitive ai^uments have a certain 
"king of the mountain" form. One shows that a hypothesis accounts for certain faets^ and that 
certain variations or alternatives to the hypothesis- while not without empirical merit, arc flawed in 
some way. ITiat is» the argument shows that its hypothesis stands at the top of a mountain of 
evidence, then proceeds to knock the competitors down. Two examples of competitive arguments 
will b'^ presented so that the remaining discussion of the validation problem can be conducted on a 
mo^'c concrete footing. 

An argument for one-dtsjunci-peHesson 

Consider the first felicity condition listed a moment o^o, A morq precise statement of it is: 
Lean ing a lesson introduce^ at most one neyv disjunct into a procedure. Ih procedures, a disjunction 
may take many forms, e,g.] a conditional branch (if-then-elsc). This fcjlcity condition asserts that 
learners will only learn a conditional if each branch (dt^unet) of the 'conditional is taught in a 

separate lesson— i.e., the then-part in one lesson, and tne else-part| in another. 

i I 

The ai^gument for the felicity condition hinges on an independegtly motivated hypothesis: 
mathemaiieai procedures are learned inductively. They are generalized from examples. There is an 
important philo^iOphical-Iogical theorem concerning induction: ff a generalization (a procedure, in 
this ease) is allowed to have arbitrarily Tiany disjuncts, then all inductive ^earner can identify which 
generalisation it is being taught only tf it is given all possible .examples. Both positive, and negative. 
This is phyf*,ally impossible in most interesting domains, including this one. ff inductive learning 
is to bear even a remote resemblance ro human learning, di^unetions must be constrained, 
Disjutietiotis are one of the inherent problems of knowledge. ' * 



earlier. 



eommunieatio.j that were mentioned 



are (i) to, bar di^uneiions from 



Two classic methods of constraining di^unctions 
generalizations, and (ii) to bias the learner in favor of generaUiations with^the fewest diyuncts. The 
felicity condition is a new method, ft uses extra input informaLion, the lesion boundaries, to control 
disjunction, llius, there arc three competing hypotlicscs for e?: plain ing ho[w human learners control 
disjunction (along with several other hypotheses that won't be mentioned (lere); (i) no-diy unctions. 
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(ii) fcwesi-disjuncts, and (iit) onc-disjunci-pcrlcsson. 

Competitive argunjcnlation involves evaluating the entailments of each of the lliree 
hypotl^eses. It can be shown thjt the first hypothesis should be rejected because it fon;es tJie tlieory 
to make ahPurd assumptions ahout the student s initial set of concepts— the primitive ctmcepts from 
wiiich procedures are birilL ITie empirjciil predictions of the other two 1i>potheses are identical 
given the lesson sequcitees tJiat occur in the data* so more subtle arguments are needed to 
differentiate between them. Here are two: 

(1) The one-disjunct-pcr- lesson hypothesis explains why lesson sequences have the structure that 
tliey do. If the fewcst-disjuncts hypothesis were tnie, then it would simply be an accident 
that lesson boundaries tall exactly where disjtincts were being introduced. ITie one-disjunet- 
per-lesson hypothesis explains a fact (lesson stnicturc) that the fewesi-disjuncts hypothesis 
does not explain. - 

(2) The fewest-diy'uncts hypothesis predicts that students would learn equally well from a 
"scrambled" lesson sequence. To form a scrambled lesson sequence, alt the examples in an 
existing lesson sequence are randomly ordered then eliopped up into hourlong iessons. Thus, 
the lesson boundaries fall at arbitrary points. (To avoid a confound, the scrambling should 
not let examples from late iessons eome before examples from early lessons.) Tlie fewest* 
diguncts hypothesis predicts that the bags that students acquire from a scrambled lesson 
Sequence would be the same as the bugs they aequire from the unscrambled lesson sequence, 
ITiis empirical prediction needs chocking. If it is false, as 1 am sure it is, then the fewest* 
disjuncts hypothesis can be rejeeted on empirieal as well as explanatory grounds. 

This brief competitive ai^ument sketchs the kind of individual support that each of the theory's 
hypotheses should be given. Such argumentation seems essential for demonstrating the 
psychological validity of a theory of this complexity* 

However, many hypotheses of the theory are so removed from empirical predictions that it is 
dinSeult to show that they are well- motivated, "ITiis is partieularly true with the hypotheses that 
define the representation used for the student's procedural knowledge. Al models of cogniu'on 
invariably use some knowledge representation language. It is widely recognized that the architecture 
of the knowledge representation has subtle, pervasive effects on the model and the model's 
empirical accuracy. Despite this belief, most discussions of knowledge representation have been 
condueted on non-empirieal grounds. (e.g.. Can the knowledge representation cleanly express tlie 
distinction hetwecn the generic conecpt "elephant," tlie set of all elcp^ts, and a prototypical 
elephant?) Knowledge representations have been treated as notatiorraT^msmcs, but they ean be 
taken as theoretical assertions. It is elcar that the mind contains information, and it is plausible that 
that information is struetureA As Fodof 0975) points out, it makes sense to ask if the st.aieture of 
the tjuhd*s information is thp same as the structure of the model's information, where the strueture 
of the model's information is defined by the knowledge representation language. It is just as 
sensical and important to ask whether hypotheses of knowledge representation are psyehologieally 
true as it is to ask whether hypotheses of learning or pmblem solving proecsses are psyehologieally 
true. However, it is eonsiderably more difficult to ascertain the truth of hypothCao knowledge 
representation since their impact on observable predictions ts often quite indirect. 

A m^or goal of the present research is to provide empirical arguments defending each 
principle of the theory, mcludins the principles that define the knowledge representation. The 
arguments for the knowledge representation are intricate and depend crudally on other, more easily 
defended principles, sueh as the feUeity conditions. The following section sketches one of the 
simplest arguments. 
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One- disjunct' peHesson entails a ffml stagk 

m 

nic argument starts ^^iili the one-disjuncbper- lesson hypothesis, brings in some data, and 
concludes ih^\i students" procedures employ goal stiicks. A goal stack ullows a pn>ccdure to be 
recursive. For instance, a recursive pn>cediire for doing borrowing is: 

Regroup (C) = 

1, Add 1010 the lop digit of column C. 

2. BorrowFrom (the next column to the left of C). 

BorrowFrom (C) = 
1 • If the top digit of column C is zeroi then Regroup (C), 
2. Decrement the top digit of column C by one. 

ft has two goals. Regroup and BorrowFrom, both taking a column as an argument. This 
procedure generates the Following problem state sequence: 

4 4 9 

a, 5 0^3 5V3 5^0^3 d, 5^0^3 -^-^^ 

^6_6 6 6 8 6 6 6 • • • • 

States b and c result fi'Om a recursive invocation or the go^l Regroup. A goal stack is needed to 
maintain the distinction between the two invocations of Regroup so that, for instance, the 
invocation of ^BorrowFrom on the hundreds column (yielding state c) returns to the right 
invocation of Regroup, 'iliis recursive procedure can borrow across arbitrarily many 7jcros*» e.g., 

4 4 9 4 9 9 

a, 5 0 0^3 R0^0^3 c. 5'o^0^3 , 5^0^0^3 5^0'0^3 5*0^3 

6 6 6 6 z 66 2 66 - 66 z 66- - • 

y 

The problem state sequences just given are exactly how many students borrow aeross zero. But this 
docs not prove that they have a recursive borrowing proecdure. They could, for instance, have a* 
borrow I^^ocedure with two loops: one loop moves lefti adding tens to zeros; the second loop moves 
right, decrementing as it goes: 

Regroup (C) ^ 

1. OriginalC <- C. 

2. Add 10 to the top digit of column C. 

3. C <- the next column lo the left of C. 

- 4. If the top digit of column C is zero, go to step 2- 

5. Decrement the top digit of column 

6. C <- the next column to the right of C. 

7. ' lfC;AOrigina]Cthengptostep5* 

This twO-Ioop procedure is not recursive. A goal stack is not needed to interpret it. So, two very 
different procedural structures arc both consistent viith student problem solving behavion The one- 
disjunct- per lesson hypothesis provides a way to tell which knowledge strueture students have. 



* The maximum depth that the goal stack achieves while solving a problem is proportional to the 
number of zeros in the problem, Sinee .students can solve problems with arbitrarily many zeros, the 
gpal si4ick has no apparent maximum depth. Evidently, this goal stack is not the same as the one(s) 
that arc hypothesized to underiie other kji;ds of cognition, e.g,. parsing center-embedded English 
sentences such as R, Stailman*s pun, '"file bug the mouse the cat ate bit bites.'' 



18 



OUJI-CllVl^S 



15 



M\c recursive procedure lias one disjuncUon: the cotidilional suiemetil in BorrOwFrom. The 
iwO-loop procedure has two disjunctions: the conditional statements on lines 4 and 7. 1Tiis is not 
an accidenL Any iwoHoOp procedure must have two disjunctions, one to terminate eacli loop. A 
recursive procedttre can ^alw;iys get awa> with one disjunction. In essence. \hc goal stack 
automatically performs the second, right-moving kmp as it pops. 

Since only one disjunct is introduced per lesson, and the two procedures have different 
disjunctions, the two prtM:edurcs will require difTerent lesson sequences in order to be learned. In 
particular, the recursive procedure can be learned with a single lesson, asstming that the learner 
already knows how to borrow from nonv.ero digits (i.e.. the student can solve 57-9). "Ilie lessoli 
woiild have examples such as the problem state sequences given above. On the othei hand, the 
two-loop procedure could only be learned using two lesM)nS.^ l^he first lesst)n would iiitruduce just 
the lefl-moving loop. It might use an example such as 

a. 5 0 0^3 b. 5 0*0^3 c. 5*0^o'3 
8 6 8 6 8 8 

which 0nl> does part of regrouping and stops. The second lesson would cnnlptcte the teaching of 
the procedure by showing how to do tlie rigliE^moving loop> It might use an example such as 

4 4 9 4 9 9 

a. 5 0 0*3 b. 5 0*0^3 c. 5*0*o'3 d. 5^0^0"3 e. 5*0*0^3 f. 5*0*0^3 

8 6 8 8 8 8 z 8 6 8 6 . , . , 

At this point in the argument, a difficult knowledge structure issue has been reduced to ail entirely 
empirical question: if students have a two loop [)roccdure, then they must have been taught it with 
a lesson sequence like the two-lcsson Sequence above. On the other .hand, if the single-lesson 
sequence is the only one in use. then students must have a recursive procedure^ Now for the punch 
line: No subtraction curriculum that 1 have examined uses the two-lesson sequciiee. 'ITie curricula 
all use the other one. The data support the hypothesis that students have a recursive borrowing 
procedure^ and hence, a goal stack. 

An important hypothesis about knowledge representation has beett supported by an 
entailment of the one-disjunct-perlesson hypothesis in conjunction with a simple empirical 
oliservation> Discovering such entailments is perhaps the most important contribution that this 
research has to make. Most of this document is devoted to describing them. (In particular, two 
other ar^guments supponing the goal-stack hypothesis will be presented.) flowe^ver, these arguments 
are often quite a bit more complex than the ones given above. Tlie bug data are particularly icky, 
which is why they have been avoided here. Complex inferences are a steep price (witness the 
length of this document!). Are they really necessary, or is there a "shallow" theory, one with more 
easily tested assertions, that will account for learning in this, domain? I think not The next 
Subsection explains why. 

Why not use a shallow theory? 

The complexity in iV theory^s verification derives firom the ambition that problem solving 
knowledge be described ii enough dciaW to actually solve problems. To sec this, consider the fate 
of a particular shallow tlicory, one of a class of stochastic learning models that explairi the 
ubiquitous learning curves of skill acquisition (see Newell & Rosenbloom, 19Sli for a review). A 
typical model has a pool of responses^ some correct and some incorrect The subject^s response is 
drawn probabilistically from this pool. Learning curves arc explained by simple lijnctions which 
add or replace items in the pool depending on the reinforcement given the learner. Consider what 
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it would mean to apply such a model literally to learning Uie skill o/ subtraction. A subiraciiou 
response is a sequence of writing actions, Lets say that each time a student observes the teacher's 
examples or answers a subtraction exercise correctly, tlic action sequence is added to the pool. To 
ans\^er a pri*Icm. Uie stuJent merely draws an action ^^qucnce from the pool and executes it. 
Qeariy. the student*s solution would ha\/1ittle to do with the problem, and jthere \^oiiId never be 
any learning. 'ITie model makes an absurd prediction. AtUiough one could augment the model by 
associating stimuli patterns vtilh each action sequence in the pool it's clear that there arc far too 
many subtraction problems for this to work. One would have to postulate a matcher diat finds the 
cloiiest pool item to the problem* At last wo have something that has ^ prayer of predicting the 
data. Hut now ail the interesting, iheorttical mchineo is hiding in the matcher. Many learning 
phenomena can be generated just by manipulating the matcher ^nd the encoding it uses for the 
stimuli. The shallowness of ^he theory has vanished. /Fo validate the architecture of the matcher 
and the representation of stimuli ^^ould require the kinds of deep inferences that this approach was 
supposed to avoid. TTie only way to get a shallov^t. mopel to work that 1 can see is to ignore the 
details* of the response that the subjects nnake, and ^mply classify their response as right or wrong, 
ITiis gross description allowed the stochasric models to predict the appropriate learning curves with 
some degree of accuracy. ITiis simplification to one bit responses, right versus wrong, characterizes 
much research on skill acquisition and virtually all educational research on mathematical skills. 

It was just shown that'a shallow theory would not work for this d^mW 'ITiat's unfortunate. 
When theories are shallow, then argumentation is easy. In a sense, the data dcN^e ai^uing for you. 
Most experimental psychology is like this. The arguments arc so direct that the oh' ' place they can 
be cririci/ed is at the bottom, where th^r raw data is interpreted as findings. ExpCT^peiital design 
and data analysis techniques are therefore of paramount importance. The reasoning from finding to 
theory is often short and impeccable. On the other hand, when theories are deep in that the 
derivation of predictions from remote structures is long and complex* argumentation becomes 
lengthy and intricate. However, the effort spent in forging them is often repaid when the 
arguments last longer than the theory. Indeed, each argument is almost a micro-theory. An 
argument's utility may often last far longer than the utility of the theory it supports. . (For examples, 
sec the discu.^ion of crucial facts in VanLehn, Brown & Greeno, in press) As an illustration of the 
transition from shallow to deep theories, linguistics provides a particularly good example. 

From shallow theories and deep non-thedries, towards deep theories 

Prior to Chomsky, syntactic thePries were rather shallow and almost taxonomic in character. 
The central concern was to tune a gi^mmatjo cover all the sentences in a given corpus. Arguments 
between alternative grammars could be evaluated by determining which sentences in the corpus 
could be analyzed by each^'^ l^hen Chomsky reshaped syntax by postulating abstract remote 
structures, namely a base e;amnyir and transformations, argumentation Jiad to become much more 
subtle. Since transfonnatipns interacted with each other and the base grjmnnar in complex ways, it 
was difficult to evaluatc^lhg^^pirical impact of alternative formulations of rules. Theories of 
syntax changed constam|y and gradually as interactions are uncovered. What has been retained 
ftom the early days i/ not whole theories, but a loosely defined collection of crucial fiicts and 
arguments, " " // J 

As Moravcsik has pointed out (Moravcsik, 1980), Chomskyan linguistics is virtually alone 
among the social sciences in employing deep Uieorics, Moravcsik labels theories "deep** {without 
implying any depth in the normative sense) if they "refer to many layers of unobservables in their 
explanations.... ^Shallow' Uieorics are those that try to stick as close to the obscrvables as possible^ 
[and] aim mostly at correlations between obscrvables,.., 'ITie history of the natural sciences like 
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physics, chcmisiry. and biology is a dear record the success siory of 'deep" theories.... When we 
come tt) the social sciences, we encounter a strange anomaly. For while there is a lot of talk about 
airijfng lo be ^scientific/ one finds in tlie social st,iences a widespread and iinargued fuj prcdileciion 
for 'sliallow" theories of the mind." (Moravcsik, 1980. pg. 28) 

Al-bascd modelling efTorts are certainly deep in that tlie> postulate rpmote structures And 
mechanisms that arc quite unobscrvablo. However, very few cffqrLs. if an>, could be called proper 
theories. They lack arguments connecting their remote siruclurc:^ lo empirical findings. In one 
sense, the history of Al based cognitive lesearch is the ^lal of linguistics' history. 11iroughoiit its 
hisU}ry. linguistics has had a strong empirical tradition. Only lately has it adopted deep theorizing. 
On the otlier hand AI has alwa>s had a strong tradition of deep modelling, and only recently has it 
begun to connect its models to observation.s. "ITie present research effort is intended to be another 
step in that direction — putting Al-based, deep models "on firm empirical footings. 

1.3 Overview of the theory and the document 

ITie preceding sections indicated tlie kind of skill acquisition under study, sketched a few 
hypotheses about it, and discussed the validation method. "Fliis section summarizes the research 
project by listing its main components. 

(1) Learning model The first component is a learning model; a large, Al-based computer 
program. Its input is a lesson sequence. Its output is the set of bugs that are pre(Hictcd to occur 
among students taking the curriculum represented by the given lesson sequence. Hie program, 
named Sierra, has two main parts: (i) Tlie learner learns procedures from lessons, (ii) ITie sober 
applies a procedure to solve test problems. "ITie solver is a revised version of the one used to 
develop repair theory (Brown & VanLehn, 1980). The learner is similar to other AI programs that 
learn procedures inductively. For instance, aldc (Neves, 1981) learns procedures for solving 
algebraic equations given examples simitar to ones appedring in algebra textbooks. i.LX (Mitchell et. 
al, 1983) starts with a trial-and-error procedure fbr solving integrals and evolves a more efficient 
procedure as it solves practice exercises. Sierra^s learner is simitar to LISX and ALEX in some wSys 
(e.g., it vses disjunction-free induction). It differs in other ways (e.g.. it uses lesson ^boundaries 
crucially, while the instruction input to AI.EX and LEX is a homogeneous sequence of examples and 
exercises). In particular. Sierra is the first AI learner to use rate constraints (described in the next 
chapter). As a piece of AI, Sierra's learner is a modest contribution. Of course, the goal of this 
research is not to formulate new ways that AI programs can learn. 

(2) Data from human learning. The data used to test the theory come Uom several sounxs: 
the Buggy studies of 2463 students learning to subtract multidigit numbers (Brown & Burton, 1978; 
VanUhn, 1982), a study of 500 students learning to add fractions (Tatsuoka & Bailie, 1983), and 
various studies of algebra errors (Greeno, 1982; Wenger, 1983). ITie data from subtraction play the 
most prominent role since they derive from the largest sample and the most objective analysis 
methods. Bugs from the other procedural skills play the secondary ,^but still important role of 
testing the across-task generality of the theory. As of this writing, only the subtraction data have 

^)^n analyzed. A fbnnal assessment of the theory^s task generality must be delayed fbr Mother 
repbr^^ ? 

(3) ^ A comparison of the modeVs predictions to the data. The m^'or empirical criterion fbr 
tHe theory is observational adeqmcy: (i) the model should generate all the correct and buggy 
procequres that human learners exhibit, and (ii) the model should not generate procedures that 
learnei^ do not acqtiirp, i.e,, star bugs. Although observational adequacy is a standard criterion fbr 
generative theories of\atural language synt^x^this is th^ first AI learning theory to use it 
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(4) A set of hypotheses. As mentioned above, early Al-based theories of cognition used 
only the ihree components listed so far: a model, some data, a^d a cofnparison of some kind. Such 
an "explanation'* of intelligent human behavior amounts to substituting one bLick box* a coniplcx 
computer program, for another, the human mind. Recent work in automatic pn>gr*imming and 
program \erirication suggests a better way to use pmgrams tn cognitive theories; The theorist 
deveipps a set of spaifications for tlie models performance, lliese serve as the theory's hypotheses 
about^the cognition being modelled. "Ihe model becomes a tool fpr ealcutjting the predictions 
made by the combined hypotheses. The p'rcsent theory has \2 such hypotheses, "rhc felicity 
conditions listed earlier ^re three of the 32. The goal stack hypothesis is another. 

(5) A demonstration that the model generates all and only the predictions allowed by the 
hypotheses. Such a demonstration is'' necessary to insure that the success or failure of the modcVs 
predictions can be blamed on the theory's hypotheses and not on tlic mod^Ps implementation. 
Ideally* 1 would prove, linc-by*linc, that the mode! satisfies the hypotheses. This just isn*t practical 
for a program as complex as Sierra. However* what has been 'done is to design Sierra for 
transparency instead of efficiency. For instance, Sierra uses several generate-and -test loops where 
the tests are hypotheses of the theory. This is much less efficient than building the hypotheses into" 
the generator.* But it lends creden:e to the claim that the model generates exactly the predictions 
allowed by the hypotheses. 

• * 

(6) A set of arguments, one for each hypothesis, that shows why the hypothesis should be in 
the theory^ and what would happen if it were replaced by a competing hypothesis. This involves 
showing how each hypothesis, in the context of its interactions with the others, inercases 
observational adequacy, or reduces Jcgrees of freedom, or improves the adequacy of ihe theory in 
some other way. "I^he objective is to analyze why these particular hypotheses produee an empirically 
successlyl theory. This eomes out best in competitive argumentation. Each of the 32 hypotheses of 
the theory has survived a competitive ai^umcnt. 

The structure of the remainder of this document 

The next chapter presents the model (component 1 in the list above) and discusses its 
observational adequacy (component 3). The remaining chapters present the hypotheses of the 
theory {component 4) and the arguments supporting them (component 6). They are grouped into 
three levels: 

1. .The architecture level establishes ihe basic relations between lesson sequences and the acquired 
skill. The acquired skill is sometimes eall(5d a core procedure because it cannot be direcdy 
observed. The architecture level also establishes the^ basic relations between the core 
procedure and observable behavior during problem solving. Such behavior is sometimes 
called the surface procedure despite the fact that it is occasionally ratlicr non-procedural in 
character. ITTc felicity conditions and the hypotheses defining local problem, jsolving are 
defined in the architecture level. These hypotheses are expressed without using a formal 
representation for core procedures. This allows the architecture levcVs hypotheses to be 
defended at a relatively high level of detail using broad, general observations about the 
character of learning and problem solving in this domain. 



* h takes Sierra about 150 hours of Dorado time to process a single subtractfoii lesson sequence. 
However, Sierra is a multiprocessor program that can be run unattended at night using as many 
'Dorados as it can find on our local Ethernet. It sometimes takes only a few days to process a lesson 
sequence, This style of research would be infcasibic without networks of f^pst Lisp machines, such 
as the Dorado. 
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^ 2, the represeutatioa ^kvel defines a fonnal rcprcscnuiiion for core piXKrcdures, 'ITic 
ix;prcscntation IcVc^^ takes the 'architecture level given, tlien uses the d^iU to settle 
repr^;^nt;a"K)nnl i^ucs. From another viewpoint, the h>polhebeJi of the represenidtion aet as 

* absolute constraint^ on learning and problem solving (as opposed to relative constraints). 

Given a particular lesson seqtienee. these absolute constraints determine a large space of 
possibje core procedures that could be acquired from it. 
* 1 

3. '\\\Q bias ?rtW^stablibhes relative eoiistniints on learning jnd prt^blem sohing. Whereas the 
representation tcVel defines a spnce of possible core procedures* the bias level defines an 
ordering relaiion over die core procedures in the space and states that learners choose core 
procedures that arii maximal in this ordering, /rhe bias level takes both of tlic higher levels as 
given, " ■ 

ll)e three levels can be introduced b> drawing analogies to se\eral pr\)minent traditions in cogniuve 
science. The architecture level is like a Piagetian theory in its br<tad-bRish treatment of cognition, 
The representational level is like a Chomskyan tiieory of syntax in that it is eonecmed with the 
structure of mentally held information, 'ITie bias level is like Newell and Simon's theory of human 
problem solving in its attention to detailed individual beh^ior and its use of computer simulations. 
Each level has its own objectives, and each uses the data in different ways. 

These dircc levels contain mostly competitive argumentation, and their fomriat refccts this. 
Liach chapter argues a single issue, llie chapter begins by laying out the competing hypodiescs. It 
indicates which hypodiesis is ultimately chosen for inclusion in the theory. ITk; body of die chapter 
' shows why the odier hypotheses lead to a less adequate tlieory. The chapter ends with a summary 
of die arguments and a formal statement of the adopted hypothesis. Chapter inlroduetions and 
eonclusions have been written so that they can be understood without reading the chapter's body. 
As a further aid to browsirig readers, each level has a summary chapter diat synopsfzes die 
arguinenLs and hypotheses discussed in the level ITicse summaries may be read widiout having 
read the leV,el itself* 

In addition to producing a six-eomponent theory, the research produced a few ^irprises. 
Mentioning one of them is ptcrhaps a fitting end for this introductory chapter. 

Fdicity conditions > teleological] rationalizations 

From die outset of diis research, it was elear diat learning depended strongly on die examples 
used in instruction. It was also clear that learning could not depend solely on die examples. Some 
odier kind of information had to be involved. The issue was. what information was being provided 
by die curriculum, and what information did the student already have? A higlily plauable 
hypodiesis was Uiat learners possessed ideological mtionalizadons as prior knowledge, Telcological 
rationalizations express die learner's presupposition that procedures have purposes and hence diat 
die "right** gtjneralization of die examples to make is die one diat leads to a procedure widi 
recognizable puiposcs for each of its parts. So, the learner acquires only subprocedures whose 
content en be rationalized vis k vis die leamer*s general notions of purposes for procedures. For 
instance, a simple rationalization is one diat views a new step (subproccdure) as preparation for an 
already known step (Goldstein's '*setup step" schema, 1974), 

This view was comfortably in line with the common view that a procedure can be learned only 
to the extent thai ii is meaningful to the learner. Here, telcological rationalizations expressed die 
meaning diat learners give procedures. The rationalizations may not impart die correct semantics 
(tiie semantics die teacher intended), so the procedure acquired may not be a correct one. Yet they 



23 



20 OlUI-CllVlS 

do give the prCM;cdurc some kind of semantics. 

'Hic view that learning is necessarily meaningflil seems now to be false for the present 
domain. I was unable to deteet any widely held tela}lpgical rationali/^itions. Moreover, those that I 
guesse<?. might be Iicid* perhaps scattered idiosyncratically in the population* did^not constrain 
acquisition enough to explain the data. On the other hand, certain felicity conditions were 
discmcred th<it were strong enough lo eliminate many of the ambiguities that tcleological 
rationali/uitiuns were supposed to settle. Although 1 had guessed a few felicity conditions some 
years ago. 1 was surprised to discover the show-work principle* and even more surprised to see how 
much constrjint the felicity conditions placed on learning. Not only do the felicity conditions do as 
muc^ or more work than teleological rationalizations, they appear to be held by all individuals* 
while the set of teleological rationalizations would have to be subjecf to individual differences* To 
top it off* the new felicity conditions are much simpler than teleological rational nations. For 
scvera( reasons* therefore* teleological rationalizations have been excluded from the theory. It 
currently seems that rationali/iition of subprocedures might be more in the mind of the observer 
(me) than in the student*s mind. 

Omitting teleological rationalisations in ftvor of felicity conditions changes the overall 
character of the learning theory. Teleological rationalizations could give the acquired procedure a 
meaning, albeit a potentially incorrect meaning, by relating it to general teleological knowledge 
abput prcKedures. The felicity conditions and the constraints on representation essentially allow the 
procedure to be built ftom primitives in apparent isolation from other knowledge* This result is 
consonant with the widely held impression that mathematical prxedures are often understood 
syr tactically (Resnick^ 1982)* U tends to refute the also common view that procedures can only be 
learned if they have some meaning for the teanwr. 
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Chapter 2 ■ 
Sierra, the Model 

ll\is chapter concerns ihc model, a computer program named Sierra, llic term "model" is 
u5ed in a niirrow my lo mean an artifact whiftc Mnicuire and pcrfumiance ib similar, in certain 
ways, to the eognltlon under study. Under this usage, "model" Is nut synonymous with "theory*" 
Tlie model is a thing; the eognition is a thing; the theory asserts liow the two things relate. In 
phyuics, a rnodel is usually a system of equations, which the theory relates to the physieal system 
being studied. A physical theory might say, for instance, which variables in the model a/e 
measurable, which equations represent natural laws, and which equations ''represent boundary 
conditions that are idiosyncratic to paitieular experiments. Of i^oiirse, tlieories include much more 
than just assertions. This theory includes, for instance, a tacit set of distinctions or ways of 
analyzing the cognition. It includes an analysis of how the data and the moders performance relate, 
U includes, of course, the hypotheses and the competitive ai;gumen[s that support them. Indeed, 
everything in this document is included in the theory. 'This chapter, however, merely describes the 
model 

Af-based models are plagued with a methodological problem that occurs in mathematical 
models as well, although it is less sev?re there. A typical mathematical model has parameters whose 
values are chosen by the experimenter in sueh a way that the model's predictions fit the data as 
closely as possible. Certain parameters, often called iosk parameters, encode features of the 
experimental task (e.g., what kind of stimulus material is used). Other parameters, called subject 
parameters* encode aspects of individual subjects' eognition or performance. There are other kinds 
of parameters as well. The difference between the parameters lies in how they are used in fitting 
the model's predicltions to the data. Subject parameters may be given a different value for each 
Subject. Task parameters get a different value for each experimental task, but tlvit value is not 
permitted to vary across individual subjects. When Al-based models have been used for cognitive 
simulations, there has often been considerable obscurity in the bomidary between what is meant to 
" be true of all subjects, and what is meant to be true of a particular subject. Oflea the same 
knowledge base (rule set or whatever) is used for both subject parameters and laslc parameters* Yet 
it is critical that theories, even jf they use non-mimeric parameterst identify which of the modeVs 
components and principles are unh/ersal, which arc task specific, and which may be tailored to the 
individual. But this is just the beginning of the problem. Even if the kind of tailorins has been 
clearly delineated as universal, task, subject, or whatever, there remains a difficult issue of 
determining how much influence the theorist can exert over the model's predictions by 
manipulating the^parameters* values. In a mathematical modeK sueh power is often measured by 
counting degrees of freedom or performing a sensitivity analysis. For models whose "parameters" 
arc knowledge bases or rule sets, there is, as yet. no equivalent measure of lailorability* It is crucial* 
however, that the tailorability of such models be better understood. A model whose fit to the data 
depends on the cleverness of the theorist writing the rules doesn't really tell us much of interest* 
Understanding Sierra's tailorabiUty and reducing it have been major concerns in developing this 
theory. Reduced tailorability is as much a goal for the theory as observational adequacy. Many of 
the hypotheses that are preserved in later chapters are adopted jusi because they reduce the 
tailorability of the model 

In addition to describing the model, this chapter discusses its observational adcQuacy and 
tailorability in the context of one particular experiment, called the Southbay experiment, wherein 
1147 Subtraction students were te^ed. As Sierra is described, its variou^ parameters will be 
illustrated by mentioning the values thai they are given in taitoring the predictions to fit the 
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Sotiilibay diitd. "Ilic cfFccls of v.ir>mg these \tilues will also be discussed as <i rough sensitivity 
analysis of the model The last section of the ch<jpicr ^isscsses Uie observatioiuil adequacy of the 
model wiUi respect to the Southbtiy data. Hiis scciion is the olI> plaee in Uie document where 
observational adequacy will dctuatly be measured, iind for good reason. Ilaung produced the 
numbers, it \\ill be tirgoed thai there are no magic thresholds for such meaMirements. One tan't kII 
from the numbers whctJier the theory is good or bad. Measuring obsier\atioiijl adequacy is only 
uscfiil for comparing iho theory lo other theories, and in particular, for comparing it (o other 
versions of itself. Thai is, observational adequacy is useful primarily as'an empirical criterion for 
competitive argumentation. Although it is intciosting lo go through a full-fledged measurement of 
observational adequacy, onec is enough. Thereafter. obscr\ationa1 adequacy will be used only as 
part of competitive argumentation, 

1liis chapter presents the model, Sierra, in enough detail that il can be duplicated. At one 
point, this document had a separate chapter for this purpose, Ho^^cvcr. it became so redundant 
with this chapter that.the two were merged. Meeting this objective sometimes involves presenting 
technical details that arc iheoretically irrelevant, but necessary for understanding ho\v Sierra works. 
Ilic reader may skim over these details. For reading the remaining chapters, it .suPfices to grasp just 
the broad outline of Sierra. In particular, only sections 2.1 and 22 are really necessary; the others 
can be skipped on a first reading. What this chapter does mt do is to justify the model. 
Moiiv^*ting, justifying and explaining wh> the model is the way it is — these are proper functions 
for competitiv^e argumentation. Coippetitive argumentation is the province of the remaining 
chapters. 

2.1 The top level of Sierra 

Sierra gs^nerates the theory's predictions about a certain class of experiments. In order to 
undeistand ^the way Sierra makes predictions, it helps to first understand the experiments. THe 
experiments use the following procedure. For each school district, the experimenter ascertains what 
textbooks are used in teactiing the given skill and when it is scheduled to be taught. In the case of 
the Soutl bay experiment. subtraction was taught from the middle of the second grade to the end of 
the foun/i graded Classrooms and testing dates are selected so as to sample this time span fairly 
evenly Next, the experimenter meets with the participating teachers in order to brief them and to 
give thcSi blank test forms, sueh as the one in figure 2-1. Soon thereafter^ the teachers hand out 
the test sheets to their students, who work them alone with no time limii. The teacher collects the 
test sheets and mails them to the experimenter for analysis. An important point to notice is the 
temporal relationship between the administration of the test and the episodes of lesson -learning. 
Suppose that a certain curriculum has ten lesson, call them l^, Ly ... L^^. Some of the students 
have taken only lesson 1^ at the time they arc tested, while other students have taken only and 
Lj, and so forth. Although a few students have takpn the whole lesson sequence at the lime they 
were tested, many data eome (torn students who have traversed only a prefix of the lesson sequence. 

This motivates the top-level design of Sierra, which is sketched in figure 2-2, Sierra's major 
components are called the learner and the solver. Sierra's learner is given a Iessc>n, and an initial" 
knowledge state, ,KSq. (Actually, it is given formal representations of Lj and KS^j. The formal 
representations will be discussed later.) The learner produces a new knowledge state, KS^. It may 
produce more than one knowledge state, but just one is shown in the diagrani for simplicity's sake. 
In order to generate predictions about students who have only taken Lj* KS^ is given to Sicrra^s 
solver along with (a formal representation oO ^ diagnostic test, T. The solver produces a se^ of 
solved tests, ST^. Each solved test in ST^ represents a testable prediction about student behavior. 
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I^gure 2-1 

Oneorthetest fbrms used to collect the subtraction data.' 



ERLC 



27 



24 



MODIiL 



X2, 



KSO ~7 f learner KSl — ^ learner ) —> KS2 — ^ learner | — > KS3 — ^ 



STl 



T — 7 j solver I T — t I "solver I 



ST2 



T — 7 1 solver I 
ST3 



Figure 2-2 
The top level of Sierra 



Figure 2-2 shows ihat KS^ is also given to Sierra^s learner along with lesson L^. The learner 
produces KS^. .whici) corresponds to the knowledge state; of students who have seen the first two 
Ic^ns of the sequence before being tested. KSj is passed to the ^Iver and processed in the same 
way that KS^ was. This prodiiecs predictions about the performances of students who have taken 
the first two lessons. Similarly, predictions are produced for students at all other stages of training, 
including students who have completed the lesson sequence. 

The model's predictions are the sets of solved tests, the ST|. In principle, they eould be 
compared directly to <*served test solutions* the ones mailed in by the teachers. For several 
mundane reasons, this not practical. Several test fomis are used in the schools in or<!er to thwart 
students who look at their neighbor's paper. If the STj were to be eompared directly to the 
observed test solutions, Sierra would have to be run many times, eaeh with a different test form as 
T. Also, direct eomparison of test solutions would have to deal with the slips that ^udents make. 
A single facts error (e.g., 7-5^=3) would prevent an observed test solution from matching a 
predicted test solutioa Some model of slip-based "noise*" would have to be applied in the 
matching process. Even if such a slip model were quite rudimentary, it would have to be carefully 
and objectively paramecerized lest it eause Sierra to be unfairly evaluated. Debuggy is used to solve 
these problems. Debuggy is equipped to deal with multiple test forms and with slip-based noise 
(see Burton, 1981). Its slip model, which was developed long before this theory, has been carefully 
honed in the process of analysing thousands of students* work. Debuggy is used to analyze both 
* predicted and observed test solutions. When Debuggy analyzes a solved test, it redescribes the test 
solution as a set of bugs. Sometimes the set is a singleton, but often a test solution, even one 
generated by the model, requires several bugs to accurately describe its answers. Given these bug 
sets, matching is simple. A predicted test solution matches an observed test solution if Debuggy 
converts both to the same set of bugs. 

This way of comparing test solutions has an added benefit. It affords a natural definition of 
partial matching: two test solutions partially mateh if the intersection of their bug sets is non- 
empty. Partial matching is a useful investigative tool. For instance, if the mode! generates a test 
solution whose bug set is {A B}, and there is a test solution in the data whose bug set is {A B C}, 
then partial matching allows one to discover that the model is accounting for most of the student's 
behavior, but the audent has a bug C that the model docs not generate. If the two solved tests 
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Were compiircd dii\;ci1y* ihcy would not match at all (say), yielding the experimenter no clue as to 
what is wrong. So comparing solved tests via l>ebuggy not onl> handles multiple test Torms and 
noise, it promotes a deeper understanding of the empirical qualities of the model. 

Sierra has a nauiral intcnial chronology. KS2 is necessarily produced after KSj, Perhaps this 
chronology makes tnic temporal predictions. In the Southbay experiment, for instance, the testing 
dateb and the textbooks are Known, so the approximate locations of each student in the lesson 
sequence cm be inferred. It would be remarkable if an S"t^ matched only the test solutions of 
students between the lessons corresponding to and Longitudinal data could even be 

"predictedrpro"vrdcd~that"The model is~"char5cd~sljghtly*. Given tliat a student's test solution 
matched a solved test in ST^. one could p'^cdict that a later test solution would have matched some 
test in STj for j>i. In fact, one inay be able to predict that the second test would have to match 
certain tests of the ST^, because only those test solutions are derived firoin the knowledge state KSj 
that the student seemed to have at the time of the first test. Although Sierra was not designed for 
it. Sierra can make predictions about the chronology of skill acquisition. 

Bven a cursory examination of the data reveals that such chro:.ologtcal predictions would turn 
Out rather poorly. Partly, this is because the experiments didn't carefully assess chronological 
factors. Although the general locations of students in the curricula were recorded, there is no way 
to know an individual's case history in any detail. In the Southhay experiment, for instance* some 
young students who had only taken the first few subtraction lessons could already subtract perfectly, 
^erhaps they learned at home or with special tutoring from the teacher. Keeping careftjl track of 
how much instruction students actually receive is, of course, a major problem in any longitudinal 
study. That is why I have concentrated on an a-chronolog:cal account cf skill acquisition. 

Even if excellent longitudinal da^ were available^ 1 doubt that Sierra's prediction of them 
would be anywhere near the mark. Basically, this theory attacks only half of school-house learning: 
knowledge communication. Knowledge compilation is the other half, ft deals with tuning, 
restnictiiriiig and other changes in the memory trace that occur with practice* Knowledge 
compilation undoubtedly affects tlie chronology of skill acquisition. Since Sierra doesn't model 
practice cf?fects, it would be wild to take its chronology seriously as a reflccdon of the chronology of 
human learning. 

The model's empirical quality is measured in an a-chronological way. All the ST. arc simply 
unioned. This creates a large set of predicted solved tests — call a PST. Similarly, the observed 
solved tests are collected together into a large set, call it OST, with' *'t regards to when the students 
were tested* The solved tests in both PST and OST are redcscribed as bug sets using Dcbuggy. 
Fmpirical quality is measured by their overlap; 

OSTnPST is the set of confimed pn^dlciions. It should be large, 

OST-PST is the set of observed behaviors that the model doesn t account for. ft should be smalL 

PST^-OST is the set of predictions that arc not confirmed by the data* Some of these predictions will 
be absurd: star bugs. There should be very few of these. The rest are outstanding 
predictions* Further data may verify them. It docsn^t matter how large the set of 
Outstanding predictions is, as long as its members are all plausible predictions. 



* Diagnostic testing undoubtedly has some effect on a student^s knowledge state, ff the model were 
Used to make predictions about students who are tested twice, it would be advisable to route the 
solver-modified KSj back up to the learner. This is not done in the current version of Sierra 
because almost all of the data come from student^ who were tested just once. Some were tested 
twice, but without intervening instruction. 
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overlap-based measure is traditionally called observational adequacy, j It is the only empirical 
measure tliat is used in validating the present theory. | 

This section describes the way that the main parts of thp model — il^e learner, the solver, the 
KS,. the and ilie S'!', — Iuk)1c together. U tils(>4esefibeMl)e-way-thatj-ljiaAheot^^)bscrvational- 
ad(Kiuae) is assessed. With these frameworks in Iiaiid. it's time to ptunge|irtto a detailed description 
of life model. The first section describes the fonnal representations fbr ldb.^ons and solved tehts. the 
f., and fhe ST,. 'ITie next few sections describe the knowledge reprcsenta(ion. the KS,. 'ITien the 
learner is de<«;ribed. with a slight pause of some general remarks aboutj induction. ITie solver is 
described next, but somewhat sketchily since it is substantially, the same 3S the one described in 
Brown & Vanhehii (1980). The last seetion reveals the observational adefliacy of the theory. vis-S- 
vis the Southbay experiment. ' 

2,2 The representation of observables; lessons and diagnostie tcsts^ 

As mentioned, Sierra takes three inputs: (1) a lesson -iequfince. L,. (2) a diagnostic test, T. and 
(3) a student's initial knowledge state, KSq. Sierra's output is! a large set pf solved, tests, the STj, 
Although the theorist must guess what the initial student knowledge state is, the other inputs and 
outputs reprc^nt observable quantities. Sierra's accuracy as i model depends somewhat on how 
these observable quantities are fbrmalized. This section discusses the reptjcscntations used fbr the 
observables: lessons, tests, and solved tests. The formal definitions arc tediously simple: 

A lesson sequence is a list of lessons. 

A lesson is a pair: it is a list of examples followed by a Wsl of exercises. 

An woffi;?/? is a sequence of problem states. j 

An exercise is a single problem state. ! 

A lest is a list of exercises. j 

A solved test is a list of examples. [ 
A problem stale is a set of symbol-position pairs, where a symbol's posiU6n is represented 

by the Cartesian coordinates of the symbol's lower left eomer and itsup(!)er right comer. 

I 

These definitions all depend on the representation of prtblem states, so it is worth a moment to 
line that definition in detail. Problem state a (see below) represents b, and c represents d 
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( 

(8 
(0 
(7 
(2 
(9 



(12 
(14 
(16 
(18 
(16 
(18 



17 
13 
19 
19 
17 
17 



17 
14 

16 
18 
20 
18 
20 



20 17)) 

19)) 

21)) 

21)) 

21)) 

19)) 

19))) 



d. 



6 0 7 
2 9 



6x+l 



((i'6 
( 

+ 
1 



(12 
(14 
(16 
(J8 



10 
10 
10 
10 



14 
16 
18 
20 



12)) 
12)) 
12)) 
12))) 



The fonnal representations, a and c, arc sets of pairs. Each pair represents an instance of a symbol 
at a place. The first element of the pair is the symbol, usually an alphanumeric character or a 
special symbol like HBAR. which stands for a horizontal bar, The second element of the pair Is a 
tuple of four Cartesian ctfordinates that represent the symbol's position, 'Ifhc details of representing 
the symbol's position don't matter. The point is only that a problem ^tate is little more than a 
picture of a piece of paper or a chalkboard, ft is not an interpretation or parsing of the symbols. 
For instance, the problem state does not force the model to treat 507-29 as two rows, or as three 
columns, or as rows and columns at all How the problem state is parsed is determined by a 
component of the student knowledge state, called the grammar. Grammirs are described in a later 
section. j 
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Trading Hundreds First 

There are 304 birds at the Lincoln Zoo 
Jgg^bir ds are from No rth America. • 
How many b^'sare from other places? 

304 -426^ ■ 




Need more 
ones? Yes. 
But no tens to 
trade. Need 
more tens. 

304 
-126 



Trade 1 
hundred for 
10 tens. 



2 10 

-126 



Trade 1 ten 
for 10 ones. 



9 

2 W14 

126 



Subtract the ones. 
Subtract ^e tens. 
Subtract the 
hundreds. 

9 

2 1014 

-126 



1 78 



304 - 126 = 178 178 birds are from other places. 



Subtract. 

1. 401 
- 182 



2. 205 
- 77 



3. 300 
-151 



4. 102 
- 4 



5. 406 
- 28 



6. 700 
-513 



7. 608 
- 39 



8* 503 
-304 



•9. 900 
- 28 



10. 



802 
- 9 



12. 500 13. 407 ■ 14. 904 is 600 
^747 -439 -_8 ~676 - 89 



16. 100 
- 56 



17. 



306 
197 



18- 204 19. 600 20. 508 
Z_Z - 29 - 429 



21. 402 - 16 



22. 700 - 8 



23. 900 - 101 



Figure 2-3 

A page from a third grade mathematics textbook, 
mitter. G.G.. Greenes. C.E. Sobel. .Vt.A., Hill. S.A.. Nfaictsky, H..M., Sliufclt. G., Schulman. L. & 
Kaplan. J. Maikemaiics, New York: McGraw*Hifl, 1981. Reproduced with permission.) 
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How Taithiijl arc these Yormal rq}rcscntatjons to real curricula and real diagnostic tests? 
Faithfulness of tests is easy to obtain. Earlier in figure 2-1. a copy of one of tlie diagnostic tests 
was presented, k can be quite faithfully /cprescnted as a sequence of exercises (prnbtem states). 
Accurately rcrrcscnting a lesson is not sck simple, Figure 2-3 is'a b lack -and- white rendering of a 
page f rom a third grade textfe'obk. It is the first page of a two^page lesson that introduces 
borrowing across icrol The lesson" tead$^)ff by posing a -word^roBlcm.- It -is followcd-by- an 
example. 304-126. "ilic example consists of four problem states. (In the textbook, tlie four 
problem states are difFcrcntiated by four lightly colored boxes, which arc not reproduced here.) 
ITic rest of the page contains exercises. However the teacher undoubtedly works the first fcw 
exercises on the chalkboard. In ef^ct, tliis converts the first few exercises into examples. The 
second page of the lesson contaii^ more exercises and a few word problems. 

Sierra's, lessons differ from rcsA lessons in several ways. In keeping with the hypothesis that 
knowledge commonication, in tms domain, is inductive. Sierra's examples lack the English 
commentary that the real examples have. Its lessons also omit word problems, pictures and 
analogies with concrete objects like coins or blocks. They have only examples and exercises. Figure 
2-4 summarizes the formal lesson corresponding to the real lesson of figore 2-3. Figure 2-4a shows 
the problem state seqoence that represents the first example. On the assomptions that the teacher 
would work this example on the board, the intermediate problem states that are not pictored in the 
textbook are shown in the formal version of the example. Figure 2-4b summarizes the whole 
lesson. The formal lesson is considerably shorter than the real lesson: it has fewer examples 
(probably) and many fewer exercises. Since Sierra is sfow, I have kept the lessons as short as 
possible. This makes it more difHcult to keep the lessons f^ithlijl to the real lessons. In a set of 
examples and exercises, there might be idiosyncratic features that happens to be held by all of them. 
The difficulty is that the formal lesson might have difTcrent idiosyncractes than tlie real lesson. 
Since Sierra's learner is mildly sensitive to such idiosyncracics. this difference can't be ignored So 
insuring the faithlijlness of lessons is not trivial 



9 

2, 2 10 2 J«f 

3 0 4 b, ^04 c, ]lfA d. I10A 
12 6 -12 6 -126 -12 6 



9 9 9 9 

2 ^14 2 i(fl4 2 >«fl4 2 Jffl4 

e, ^0ir f. jf0g g, h. $Sffir 

-12 6 "126 -12 6 -12 6 

8 7 8 178 

B 9 11 5 

2 ><fl4 7 / 14 6 MTl? 

a, /Jf/ d, 804 e. 304 f. 800 

-12 6 -368 - 28 -366 -166 - 44 

178 468' 679 

Figure 2^ 

A shows the first example of the lesson as a problem state sequence (omitting crossing-out actions), 
B summarizes the three examples and three exercises that constitute the formal lesson. 



32 



Model 



How an individual lesson is represented w^is just discussed. A curricuhim is rormali/ed as a 
sequence or lessons. Some of i!ie uicit issues behind romiali/.i;ig curricula are be^t discussed in the 
context of specific cases. Two textbooks ^ere used by the schools that participated in tlie Southbay 
experiment: the 1975 edition of Scott-Foresmar/s Mathcmatia ArounJ Us. and the 1975 edition of 
Heath's Hcaih Ekmcniwry Matlmmtks. Frti m these t CKibopkOli ree formal les son sequences were 
eventually derived. {This development is interesting partly because ft is a cfe^r case of tailoring a 
parameter of the model.) Some eurricular features Uiat at first itcemed to be important turned out 
not to be. In particular, both textbooks introduce muiticolumn subtraction using special notational 
devices that emphasize the columns and their names. Scott- Forcsinaii labels the digits, as in a 
below* then switches to column labels, as in 6, then finally to standard notation, as in c. 



a 




b lens 


units 


3 tens 


7 ones 








5 ones 


3 


7 






+ 


5 


tens 


ones 




i 



Heath starts with then switches to c. Generally, the textbooks would stick with their first 
notation until the second lesson on borrowing. "I^hen they would shift to the next notation, and 
teach the last few lessons over again using the new notation. Sierras first formal lesson sequences 
copied tlicse notational excursions faitliftjUy — lines, words and all. It was found that these extra 
markings made no significant difference in Sierra^s predictions. When the extra markings were 
omitted from the examples, the resulting core procedures generated the same bugs. This finding 
suggests thai the extra markings are included in the examples because they 'help students learn a 
grammar for subtraction notation. Sierra is given a grammar instead of learning it (this is discussed 
in the next section)* so it receives no benefit from the extra markings. The lesson sequences tliat 
were ultimately arrived at use only the standard notation {type c above). This makes them shorten ^ 
saving Sierra time. 

c 

There are a few more minor differences between the real lesson sequences and the format 
ones that will be discussed later. A major difference, perhaps the most important difference, will be 
discussed nexL Figure 2-5 shows the lesson sequences for Heath (H) and for Scott-Foresman (SF)* 
Note that, both H and SF involve a special lesson on regrouping. {In the McGraw Hill lesson of 
figure 2-3, this subskill is called "trading" instead of "regrouping.**) The regrouping lesson is L3 in 
H and in SF. The regrouping lesson does not teach how to answer subtraction problems per sc. 
It teaches how to do" a subprocedure, regrouping, that is later incorporated into the subtraction 
procedure. U is possible that students may not understand that this regrouping lesson has anything 
to do with subtraction. Afler all, students are being taught many other skills {e.g., addition) as they 
arc taught subtraction, yet few develop subtractiQ|i bugs by mistakenly incorporating lessons from 
addition or other skills. Very little is known about how students filter irrelevant lessons out of a 
skill's lesson sequence. But whatever this filter is, students may use it to filter out the regrouping 
lesson as well as addition lessons To test this* a third lesson sequence was constructed by deleting 
the regrouping lesson from H. Tliis lesson sequence, HB, turned out to be quUe productive. It 
generated eight observed bugs that would not otherwise have been generated** So it seems that 
some students take regrouping to be a part of subtraction and some donX Lesson sequence HB is 
included with the^other two tn generating the Southbay predictions. 



* The observed bugs generated by HB alone arc: Borrow-Don*t-Decremcnt'Zcro-Unlcss'Bottom- 
Smaller, Borrow-Across-Second- Zero* Borrow-Firom-Onc*fs*Ninc^ Borrow- From-Onc-Is-Tcn, 
Borrow-From-Zero^ Borrow-From*Zcro-Is-Tcn, Stops-Borrow-At-Multiplc-Zcro, Forgci-Borrow* 
Ovcr-Blanki, and Smallcr-From-Lai^ger-Instcad-of- Borrow- Unless- Bouom-Smallcr. 
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MODEI. 



A 


























8, 11 1 




8 12 






Li. 


2 9 


L2, 


3 7 


u, . 






L5, 


2 6 7 



-16 

i 4 



"^"3 



- 4 4 

4-8- 



-12 3 

— 1-3-4- 



4.3 7 

6 

4 3 t 



L7. 



.2 14 

16 1 

t 9 7 



Lj. Solving two columns. 

Lj. Handling partial columns. 

Lj. Regrouping. 

L^. Simple borrowing. 

L,. Solving three columns without borrowing. 



13 

4/ 12 

6_0 

4 7 4 



I. 



1.9, 



9 

2 Jrfl4 

12 6 
t 7-8 



LlO, 



9 9 
7 )(f«14 

1 2 d 



7 8 7 5 



10- 



Handling non-flnal'partial columns. 
One borrow in three columns. 
Two adjacent borrows (3 columns). 
Borrowing from 7j;ro (3 columns). 
Borrowing from multiple zeros. 



B 



Ll, 



8 n 



2 14 

16 1 
19 7 



L2, 



L7. 



2 14 

: 4 

2 9 
10 

■ 6 9 3 
7 5 8 3 



L3. 



Regrouping. 

Borrowing in 3-djgit problem. 
Non-borrowing in 3*digit problem. 
Borrowing in 4-digit problem. 
Non-borrowing in 4-digit problem. 



3 6 
- 4 

3 t 



.U. 



L8. 



9 

7.;<fi4 
3 3 6 6 



4 13. 

-39 
t 4 



L9, 



5 4 4 6 



L5. 



9 9 
7 ;«1?14 

12 9 
7 8 7 6 



2 9 
1 6 



t 4 



L^. Solving 3-co!umns, with one bonow. 

Uj. Adjacent borrows, in 4-column problem. 

Ijg. Borrowing from zero (4 columns). 

L^. Borrowing from multiple zero. 



Figure 2-5 

(A) The H lesson sequence, (B) The SF lesson sequenee. 
Sample problems are shown above, topics arc listed below. 



Of the three inputs to the model — the initial kiiow ledge state KSq, the test T, and the lesson 
sequence Lj — the one that has the most effect on, the model's predictions is the lesson sequenee. 
In fact, for the Southbay experiment, only three runs of Sierra were used, one for each of H, SF 
and HB, The same KSq and T were used with each nia because they have very little effect on the 
ultimate output 
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The TCprcsentation of procedures 



This section discusses the representation of student knowledge. Hie particular repiescntation 
to tjc described is the ninth in ^ series of reprcs -ntdtions. which begdn with a homebrew version of 
the|or^2 production system language (Korgy & McOermott. 1978). The present knowledge 
n^pfcsciftltWnS'Vcry mTicfiTlic~pTUairct"^^ It expresses 

einpi/ical i;ypothcses of the theory. 1>»is mcttiodological stance deserves comment. 

Every M-based models of eognition that I know of has some kind of knowledge 

representation tantguage. Various kinds of prtJdiiction Systems are common, for example. Often. 

one reads that a theorist has rcviseji a widely n\ ailable language in -order to make it ''better" for the 

model under development*^ yet no theoretical elajms are attached to . this implicit assertion of 

optimality. The knowledge representation language is beir^ rreated as a mere notation tliat the 

theorist may ehahge at will in order to make it more convenient to use. 

• 

Howtver, one often sees conjectures that the knowledge representation is more than a mere 
notation (e.g., "We confess to a stroog premonition that the actual organization of human programs 
elosely resembles the production system organization." Newell & Simon. 1972, pg. 803). Fodor 
(1975) argues that such conjectures may be legitimate as scientific hypotheses about the mind. It is 
'clear that the mind holds informatiort (knowledge) and it is plausible that this infonnatibn is 
structured in some way, iTierefore, it makes sense to ask what that strucuirc is. One way to find 
out what the structure of knowledge is (in Fodor's terms, to determine the mind's menfalese) is to 
find constraints that structure a model's knowledge representation in theoretically efficacious ways. 
Given that these constraints succeed for information in a model of the mind, their success may be 
due to the fact that they reflect constraints on information in the mind itself. This investigation's 
search for the optimal representation of procedural knowledge for the model is motivated, in part, 
by faith in Fodor's research programme. 

The eatch is showing that the success of the model actually depends on the constraints. A 
proposed constraint on mcntalcse is not convincingly supported if violating it still allows a successftjl 
model to be constructed. The hard part^ therefore, is showing that the form of the knowledge 
representation makes ^ dificrence in the model's predictions. This typically requires rather 
complicated competitive arguments. I was surprised to find as many as I did. Indeed, most of the 
aigumentatton in following chapters concerns the representation. 

That's enough commentary, Let*s move on to the knowledge representatioji itself. A students 
knowledge state is represented by a four-tuple: 

1. a procedure: knowledge about appropriate problem sofvir^g actions and their sequence 

2. a grmman knowledge about the syntax of a mathematical notation 
3* pafches: knowledge about past impasses and repairs 

4. critics: knowledge about "wrong" problem states and problem solviog actions 

The most important of these is the procedure (sometimes called the core procedure). Procedures 
and grammars will be. described in this section and two following iL Patches and critics are 
components of repair theory that won*t be described in detail in this chapter* 

A procedure is rq)rescnted as an And'Or graph, ^r aoO (Winston, 1977), Figure 2-6a 
sketches an AOG for a version of subtiaction that wilf Ijc often used in this document for 
' illustrations* AOG nodes are called goals, and links are called rules. Rules are directed, and are 
always drawn running downward, llie goals just beneath a goal arc called its subgoah 
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eORROW/FROH DORROW/IHTO 




SUBlCOL 
1/SH0W2 1/BORROW 2/BORROW 




SUB/REST 
1/SHOW 



X/OVRWRT W/OVRWRT 

Leg 



X/OVRWRT W/OVF 

■ I ' I 



B 



Goal: START (P) TyperOR 
1, Q->(SUBP) 

Goat: SUB (P) Type: AND 

1. Let Tf B and A be toQ» bottom and answer of the 
rightmost cotumn of problem P"> (1/SllBT B A) 

Goal:1/SUB(TBA) Tvpe;OR 
1. Regrouprng problem format -> (REGROUP T) 
Z There fs a column to the left ol T -> (MULTI T B A) 
1 0 (Write A (Sub (Read T) (Read B))) 

Goal; MULTI (T B A) Type; AND 

1, Q->(SaBlCa.TBA) 

2, Let NTt NB aixJ N A be the top, bottom and answer 
of the left-adiacenl column to T 

"> (SUB/REST NT NBNA) 

Goal: SUB/REST (TB A) Type: OR 

1. There is a corumn to the teft of T -> (MULTI T B A) 

Z B is blank (SHOWT A) \ 

1 0 (Write A (Sub (Read T) (Read B))) 

Goal: SHOW (T B A) Type: AND 
1. Q "> (1/SHOW T A) 

Goal: 1/SHOW (T A) Type: OR 
1, 0 "> {wme A (Read T))) 

Goar:SUB1Ca.(TBA) Type; OR 

1. B1sblank">(SH0W2TA) 

2. T<B "> (BORROW TB A) 

3. 0 "> (Write A (Sub (Read T) (Read B))) 

Goal:SHOW2tTBA) Type: AND 
1- Q->(l/SHOV/2TA) 



GoaL 1 /SH0W2 (T A) Type: OR 
1. Q ">(WrileA(Read7))) 

Goat: BORROW (TB A) Type: AND 
1. Q">(1/B0RR0WT) 
Z Q ">(2/B0RR0WTBA) 

Goal: 1 /BORROW (T) Type: OR 
1. Q ">(ReGROUPT) 

Goal:2/BORROW(TBA) Type: OR 

1. 0 (Write A (Sub (Read T) (Read B))) 

Goal:Ra3ROUP<T) Type: AND 

1. Let NT be the top digit ol the fett-adjacent column to T 

"> (BORROW/FROM NT) 
Z Q->(BORROW/,^TpT) 

Goal: BC^ROW/INTO <D Type: OR 

1 . 0 "> (OVRWR T T (Concat (One) (Read Tr^)) 

Goal: BORROW/FROM (TI^ Type:<^ 

1, TDi3iero->(BFZTD) 

Z Q ">(0VRWRTTD(Sub1 (ReadTD))) 

Goal:BrZ(TD) Type: AND 

1. Q ">(1/BFZ7D) 

2, Q">(2/BFZTD) 

Goal: 1/BFZ (TD) Typo: OR 
1- Q-> (REGROUP 7D) 

Goal:2/BFZ(TD) Type: OR 

1, 0 "> (OVRWRT TD (Subl (Read TD))) 

Goal:OVRWRT(DH) Type: AND 
1- 0 (X/OVRWRT D) 

2. Let X be Ihe blankepace over D (W/OVRWRT X N) 

Goal:X/OVRWRT(D) Type: Oft 
1- O ^>(O0S90utD) 

Goal: W/OVRWRT (XD) Type: OR 
1. Q ->{WdtfcBXD) 



Figure 2-6 

ADG for a correct subtraction procedure acquired from the H lesson sequence. 
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llicrc arc two types of goals: and and Or* To execute an and goal all the Subgoals are 
executed. To executc ar OR goal just one of the subgoals is executed, AND goals arc drawn with 
boxes around their labels* Drawings of aOGS abbreviate goals whenever they appear more-than 
once. Fur iniitance, ♦OVRWRT is called from sevejal places in the AOG of figure 2-6a but its 
subgoals ure drawn only for one of these tKXHirrences. Mdiough abbreviaUun makes this AOG look 
like 3i tree, it is really a cyclic directed graph due to the recursive calls of MULT I and REGROUP. 

AOG goals are called Tion-primlthx if they have subgoak and primitive if they dun*t. To avoid 
clutter* AOG drawings display only non-primitives goals jnd tlieir subgoals. Only four kinds of 
primitive goals^ ^re allowed: 

L Primitive actions zm%^ a change in ihe current problem state. The only primitive actions used 
in mathematics are ones that write a given alphanumeric symbol at a given position (Wrtte 
and WriteBX or ones that write special kinds of symbols (Cross Out puis a slash over a 
symbol). These three primitive aaions are ihc only ones used in tlie AOGS for the Southbay 
experiments. * 

Z Facts functions return ^number witliout changing the problem state. The facts ftjnctions used 
''in the Southbay aOGS are Add, Sub* Addl, SubL Mult, Quotient, Remainder, One 
* (which always returns 1)> Zero (which returns 0), and Concat (which concatenates two 
numbers, c*g., (Concat 1 4) returns 14), 

3. Facts predicates return true or false without changing the problem state. The facts predicates 
used in the Southbay AOGS are: LessThan?, Equal?, and Divisible?, 

4* The primitive function. Read, returns the symbol written at a given place. Thus^ 
(LessThan? (Read T) (Read B)) is tnie if the digit at the place denoted by T is less than 
the digit at the place denoted by B. 

Primitive goals arc, by definition* indecomposable — they have no subgoals. Since Sierra's learner 
learns by composing goats from subgoaK all primitive goals are necessarily a part of the initial 
knowledge state. KSq. However* tiie initial knowledge state may contain non^primitives as well as 
primitives. For instance, the initial procedure from which the procedure of figure 2-6 was learned 
contains the non-primitive goal OVRWRT, which crosses out a symbol and writes another symbol 
over it 

AOG drawings, such as figure 2-6a> do not indicate several kinds of information. To see this 
information in Sierra> one merely louehs a goal with the mouse (a pointing device) and the goaPs 
complete definition is printed out In this document, more cumbersome methods must be used to 
display goal definitions. Figure 2-6b shows the definitions for the non-primitive goals in the aOG 
of figure 2-6a. Goals have arguments* which have the semantics that a recursive procedure*s 
arguments have in a computer Iang:iage. For instance* SUBICOL has three ai^guments, T, B, and A. 
A goars nilcs (i*e., the rules leading from it to its subgoals) are listed in the definition, SUBICOL 
has three rules* Each rule has a pattern and an action. Patterns a.e complex* so their description 
will be delayed for a moment (In figure 2-6b, non*nulI patterns are replaced by English glosses; a 
null pattern is always matches.) An action is a form, in the Lisp sense* which calls the rule's 
subgoal. The action may pass arguments to the subgoal, often by evaluating facts functions* For 
instance* SUBlCOL's third rule has (Write A (Sub (Read T) ( Read B) ) ) as its action* This 
form calls the goal Write passing it the value of A as its first argument, and a number, roughly 
T— B, as its Second argument (Throughout this document, T B and A will stand for the top 
(minuend), bottom (subtrahend) and answer places in a column.) What this action does is write the 
difference of the top and bottom digits of a column in the column*s answer* 
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An OK goars nilcs arc tested in Icflno- right order. The first rule whose pnticrn matches is 
executed. "ITic learner adds new rules at the left. Hence, tlie left*to*right ordering convention 
corresponds to a common amflict rcSv>Ujtion strategy in production systems called "^recency in long 
term memory" (Mcpennott & Forgy* 1978). Because the patterns of OH rules test whether to 
execute a rule, they are called test patiCnis, Although ANh rulfc patterns have the same syntax as OR 
rule patterns^ they arc not used to control which rules are executed. Hie order of execution of AND 
rules is fixed: the mics are executed in left-to right order. ANn mfe patterns are used to retrieve 
infomnation in the current problem state m) that the information can be passed R) tlie rule's subgoal. 
AND rule patterns are called fetch patterns. 

The procedure of figure 2-6 will play a role in illustrations of later sections. It is one of the 
procedures acquired by traversing the H lesson scquenre. It is worth a moment to aplain what 3i 
does informally. The root goal. START* and its subgoal SUB, simply initialises column traversal to 
start with the units column. 1/SUB chooses between three subgoals: MULT I is for multiple column 
problems. REGROUP is for "rcgroupiijg" exercises that don't involve any subtraction at alL This 
subgoal is left over from learning regrouping separately ftom multi-column subtraction (i.e., from 
lesson L3). Nonnally, 1/SUB nev^r calls it. The third |oaK Write, is for single column 
subtraction problems. The "main loop" of multi-column traversal is expressed by MULTI as a^tail 
recursion, MULTI calls itself via its subgoal SUB/REST. SUB ICOL processes a column. It chooses 
between three methods for doing so* Jf the bottom of the column is blank, it copies the top of the 
column into the answer via the subgoal SH0W2. If the top digit of the column is less than the 
bottom^ it calls BORROW. Otherwise* it writes the difference of the two ^digits in the answer. 
BORROW has two subgoals: 1/BORROW calls REGROUP, and 2/BORROW just takes the difference in 
^Jhe column and writes it in the answer. REGROUP is a conjunction of borrowing into the column 
that originates the borrow (BORROW/ INTO) and borrowing from ^he adjacent column 
(BORROW/FROM), fn this procedure* BORROW/FROM occurs before BORROW/INTO, ft would be 
equally correct to reverse their order* but that is not the way that Heath teaches them. Borrowing 
into a digit is just adding ten to it. Borrowing from the nat column is also easy when its top digit • 
is non'zero: the digit is decremented, [f the digit is zero^ it calls BFZ. BFZ regroups, which causes 
the zero to be changed to ten, then it decrements the ten to nine. 

lA The representation of gmmmars 

It is obvious that students who can solve mathematical problems must have some 
understanding of the syntax of mathematical notation. The student^s knowledge of the notation's 
syntax is called a grammar. Grammars are formaliEed as two-dimensional context-free grammars. 
Figure 2-7 displays a grammar for subtraction notation. The grammar representation language has 
not been subjected to the careful developmei\t that the procedure representation language has. 
Consequently its conventions arc, for the most part, matters of convenience rather than theoretical 
hypotheses. Nonetheless, it is worth going through the grammar representation just to show what 
kinds of knowledge need to be represented and to note the few places where critical hypotheses lie. 
Grammars have two kinds of rules: 

1. Category redundancy rules have the form X Y where the right side has just one category. 
This means that everythiog that is in category Y is also in category X. Thus* DIGIT 5 
means that all 5's arc digits. Several category redundancy rules may be abbreviated as one 
rule by using commas in the right-hand side, e.g., SIGN -*> +• - means that both + and 
- are signs. The last six rules of figure 2-7 are category redundancy rules. 
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SIGNED/GRID ---> SIGN CGRID ; HORIZ 

CGRIO ---> ACOL (ACOL)+ (ACOL) . ; HORIZ 

ACOL ---> COL (DIGIT) ; VEHT BARRED 

COL --> CELL (%DIGIT) ; VERT UNBARRED 

XNUM ' ---> NUH (/NUM)+ /NUM ; VERT UNBARRED 

NUM , — > DIGIT (DIGLT)+ DIGIT ; HORIZ 

i ^ 

DIGIT ^ --> ID/ELT. 2, 3, 4, 5", 6, 7, 8. 9 

ID/ELT —-> 0, 1 - 

SIGN — -> ' + . - 

NUM ---> DIGIT 

CELL DIGIT. XNUM t 

PROBLEM ---> SIGNED/GRID 



Figure 2-7 

A grammar for multi-column addition or subtraction problems 



2. Part-whole rules have the form X — > Y Z where the right side has two or morcjcategories. 
Part-whole rules define aggregate cate^jorics in temis of their parts.* 'ITie rule X --> Y 2 
means that X can be composed of parts Y and Z. Whenever one has a Y and a Z that are 
situated in the appropriate geometric relationsSip, one has an \ Part-whole rules bear an 
annotation. located after a semi-colon, that ^ccifies whether the rule's categories are arranged 
in a horizontal, vertical or diagonal Mne. For instance. 

SIGNED/GRID -->,SIGN CGRID ; HORIZ , ' 

means that a signed grid is composed of a sign followed horizontally by a cgrid (CGRID 
stands for "columnar grid"). 

• 

There are several biases about mathematical notation that have been built into the grammar 
formalism- The most important 'oDe is the distinction between a tuple and a list/ There are two 
kinds of part'whole rules, called tuple rules and list rules- Tuple rules arc like ordinary context-free 
rules in that . a rule's left-hand category has exactly tfie parts tnentioned on the right (i.e., 
SIGNED/GRID has exacdy the parts SIGN and CGRID-^ Lhi rules are for defming sequences of 
arbitrary length. They have a special format* They have exactly three ciUegpries on the right side: 
W "> X Y+ Z means that X is the category of the first element of the sequence, Z is the category of 
the last element, and Y is the category of the middle elements. The plus sign is what differentiates 
list rules from tuple rules. Both tuple and list rules mark optional categories by placing them in 
pare nthes es: For instance^ the list iple ' , ' " 

NUM — > DIGIT (DIGIT)+ DIGIT ; HORIZ 
means that a number (a MUM) is at least two digits, with arbitrarily many digits in between^ The 
tuple rule 

ACOL — > COL (DIGIT) ; VERT 
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means iin answer-column (an ACOL) is a column (a COL) with an optional digit under iL 'Fherc 
are oUler, minor grammar writing notations in addition to the tuple/list distinction and optiona)it> * 

^ Some of these gTiimmar-writing notations are more than just a convenience. 'ITiey are 
potential elements of a micro-theory of malhem^itical syntax. For instance, list rules are included 
because the tripartite notion of bcfgin-middle-end of a sequence is hypothesized to be highly salient. 
If list rules are absent/ sequential Gitegories can still be expressed u^ng only tuple rules. For 
instance, a multi'digit number can fie expressed by 

NUH DIGIT (NMM). 

However, this expression loses the idea that the bmmdary elements of the sequence, the Hrst and 
last ones^ rnay be special Using tuple rules, there is no simple way to indicf^te, for instance^ that 
the first digit should be npri^sxro. List rules bias the gmmmar to express sequences so that the first 
and last elements are special 

One of the main functions of the grammar is to parse problem states (i.e., interpret them 
syntactically). A parse tree is the grammar's interpretation of a particular probleru state, h dictates 
what groups of symbols are relevant in the cunenl problem state. Figure 2-8 shows a problem state 
and the parse tree that results when it is parsed with the grammar of figure 2-7. *rhe 18 nodes of 
tha* parse tree are essentially the only objects that "exist" in the problem slate. U is worth a 
moment to walk down this parse tree in order to get a feel for how the grammar "views" 
subtraction problems. By the way^ the grammar given in figure 2-7 is the one used in all Sierra's 
subtraction runs. The whole problem is considered a SIGNED/GRID^ which has two parts. The left 
part is just a minus sign, in this case, although the grammar permits to fill ihis role as well 
The right part of the SIGNED/GRID is a CGRID. The grammar defines CGRIDs as list of ACOLs. 
In this case, there are three ACOLs, namely the hundreds, tens and units columns, ACOL is short for 
^'answer column" because these are exactly its parts: an answer place and a column, fn each of 
these ACOLs. the answer is a BLK (i.e., blank, which is a dummy category that fills optional 
constituents) and the column is a COL. A COL has a top part and a bottom part llie bottom can 
be blank, as in the hundreds column. Usually it is a digit The top part of a COL is a CELL. 
CFLLs are usually just digits, as they are in all three columns here. However, they can be the kind 
of symbol groups that results firom ^scratching out a number and writing another number over i^ 
(called XNUMs in the gramn:ar). 

, Notice that the grammar docs not define any aggregate objects corresponding to the rows of 
problem. Essentially, the grammar says that grouping the symbols into columns is relevant but 

grouping them into rows is not From this perspective, the grammar is a skill-specific ontology. It 

defines the natural kind terms that arc relevant for the slcill. 



* To accommodate two*dimensionality, the usuaf interpretation of constituents for one dimensional 
(string) grammars is modified slightly; the rectangular region occupied by a constituent may not 
overlap another constituent's region, nor may a constituent's region include symbols that are not 
descendents of the constituent. Certain notational devices violate these conventions. As it turns 
out* these are general devices in mathematics, so ways of handling them have been built into the 
grammar formalism (as opposed to handling them tn individual grammars). These are implemented 
using special annotations on part-whole rules: Among the categories on the right side of a rule, /X 
means that X must be crossed out* an(jb^ means that X must not be crossed out. After the semi- 
colon. BARRED means ^at the categories in a rule must be separated by vertical or hori;:ontaKbars^ 
and UNBARRED means that the rulers categories must not be separated by bars. 
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Figure 2-8 

A problem state (at upper right) and its parse tree. 
Parse nodes are bbellod by a unique serial numben followed by ihe categories of ihe node, 
which arc Separated by colons. For instance, node 18 is a CELL, a DIGIT and a 5. 



\ 



2.5 llie representation of patterns 

Grannars have an intimate relationship with the patterns ihat appear on aog rules. That is 
why descibing rule patterns has been left until now. A pattern is a set of relations, whose 
arguments are goal alignments and pattern variables. Panems do not have logical connectives, 
quantifiers, equality relations, lijnctlons. or other complexities. From a logical^^^d point, a pattern 
is a pure conjunction of literals (a literal is a predicate or a negated predicate), and a pattern 
variable is interpreted as a Skolem constant lliis simplicity is a result. of several importah' 
constraints on learning that will be discussed in later chapters. In order to illustrate patterns, a 
version of SUBICOL which is slightly different than the SUBICOL of figure 2-6 will be used. Its 
definition, with an English rendition ofe-the^ rules, is shown in figure 2-^. The first pattern tests 
whether the bottom (subtrahend) of the column is blank. The second pattern tests whether the top 
digit of a given column is less than the bottom digit. The third, null pattern is always true. Both 
goal arguments and pattern variables appear in the patterns. AC is SUaiCOL^s ai^gument C, T and 
B are pattern variables. They are of three kinds of relations in patterns: 

1. Ca/egoncj/re/af/o/ir are defined by the grammar. For each category in ihe grammar, a 
categorical relation is defined In these patterns, (COL C) and (SLK B)aretheonly 
categorical relations. 

2. Facts predicates dit relations that are defined by the proctdurc. lliese were discussed earlier, lu 
these patterns, (LessThan? T B ) is the only facts predicate. 

3. Spatial retaiions 3itc relations that are built into the pattern formalism. There arejust six of them: 

(First? S x) Object X is the first part ofsome sequential objects* 

(last? S X) Object X is the last part of some sequential objects. 

(Ordered? S x y) Object x comes before-y in some sequential objects. 
(Adjacent? S x y) Object x is adjacent toy in some sequential object S. 
(IPart X y) , Objectxisapartofobjecty, 
(Tuple T X y ... z) Object T is atuple composed of objects x, y, z etc. 
Although the spatial relations are built in, they depend on the grammar for their meaning. For. 
instance,since thegrammar defines COL tobe avertical category, (Ordered'^ C T B)means 
that! is above B. If COL were a horizontal category, it would mean that! isieftofB. 
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Goal: SUBICOL (AC) Type; OR 

1- {(IPart AC C) 
(COL C) 
(IPart C B) 
(BLK B) 

— -> (SH0W2 AC) 



{(IPart AC C) 

("Part C T) 
— frPaTT"(rBT 



(Ordered? C T B) 
(LessThan? T B)} 
— > (SORROW AC) 



3- {> ' — > (DIff AC) 



ITAC has a parte, 
which is of category COU 
with 3 part B 
which ii; blank, 

then call SH0W2 with AC as its argument 



KAC 'tiasapartC, 
whose pans atxj T 
"amTBT 



where lis above B, 

and tile problem j;tdte has ^ number at location T 
which is Icsi; than the number that is at location B^ 
then call BORROW with AC as itsai^gument 

Otherwise, call DIFF with AC asitsarigument 



Figutxj2-9 

Definition for a version orSUBlCOL, a goal that processes one column. 



Spatial relations, categorical relations and Tacts prcdieatcs are the only relations that patterns 
may have. There arc many rcasons for handling rclations this way. but chief among them is the so* 
called primitim problem. Any learning theory that describes how knowledge is constructed from 
smaller units is open to questioning about its set of primitives: what arc the units that are assumed 
to be present when learning begins? If the ehoiee of primitive$ is left for the theorist to decide, and 
especially if the theory allows the set of primitives to vary across individuals, then it is usually 
possible for the theorist to tailor the predictions of the theory to an unacceptable degree by 
carefully selecting the primitivesw Under the approach taken herc, the theorist can only vary KSq* 
the initial knowledge state, in order to tailor the primitives for individual dififercnces Oi for different 
mathematical skills. KS^ includes the grammar, the primitive facts functions and the primitive facts 
prcdicates* The latter are somewhat uncontroversial. The only part of KSq worth tailoring is the ^ 
grammar. Only by modifying the grammar can the theorist manipulate the vocabulary of pattern 
rclations. The vocabulary of primitive rclations cannot be manipulated directly. Thfs goe:^ a long 
way toward dealing with the primitives issue, as chapter 13 shows. 

During the execution of a procedurc, patterns arc matched against the parse tree of the 
current problem state. However, they are matched differendy depending on whether the pattern is 
a test pattern or a fetch pauem. The patterns that were just used for illustration came from an OR 
goal, SUBICOL. Thereforc, they are test patterns. An OR rule is executed only if its test pattern is 
true, where truth of a test pattern is defined to be exact matching: a pattern matches exacdy if all of 
its rcladons match. If no rulers test pattern is true, a hah impasse occurs (impasses and repairs are 
described in a later section). The patterns on AND rules (i.e., fetch patterns) have the same syiLtax 
as test patterns, but they are used dificrcntly. When an AND rule is executed, the fetch pattern ts 
matched to the parse trce, then the bindings Of some of its pattern variables are passed. to the rule's 
subgoal. The truth c" fetch patterns isn t particularly useful since fetch patterns don*t coritrol the 
course of execution. Fetch patterns are matched using closest matching: the matcher uses bindings 
for the fetch pattern's variables that maximizes the set of relations that match. If more than one 
such binding exists, an ambiguity impasse occurs. Other than the differcace ^n how they are 
matched, fetch patterns and test patterns have idenUcal syntax and semantics. 
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lj6 An introduction to induction 

Sierra's learner has two components. One is basically an inductive generalization algorithm or 
ind\tcer. "ITie inducer builds a new subprocedure, gjj^en a lesson's examples. The other, component, 
called the deletion unit, removes one or more rules from tlie subprocedure that the inducer 
constructs. "ITie inducer is by far the more complex and importnnt of the two components. 
Although inducers are common in Al, they are less common thnn interpreters and problem solvers 
(which nrc tlie main components of Sierra's solver). 'Jliis section reviews some b^sic concepts of 
inductioa In the following section. Sierra's learner will be described. 

IntJiiction has been defined as-the discav^r^^of generalities by reasoning fronj particulars, or 
more succinaly, as generalization of examples. As a species of reasoning, induction has been 
suidied in many field under many names. Concept formation, learning b: example and grammar 
inference are just a few of its names. Dietterich and Michatski (1981) arid Cohen and Fiegenbaum 
(1983) review i ^ literature on symbolic (Al) induction, Bierman and Feldman (1972) and Fu and 
Booth (1975) review the literature on pattern induction and grammatical inference. Anderson, Kline 
and Beasley (1979) review the literature on prototype formation from an Al perspective. ITiis 
section introduces some of the basics of induction. 

Winston's early work in inductive learning is a classic illustration of induction (Winston, 
1975). It will be used throughout this document tu furnish simple illustrations of induaive 
principles. His program learns definitions (concepts) for terms that designate structures made of toy 
blocks. It docs so by examining scenes tHat have examples of the structure being learned. Figure 
2-10 shows some scencS used tu teach the concept "arch." When Winston^s program compares 
scene a with scene k it discovers that the block that is on top (the lintel) can be either a brick or a 
wedge. It happens to have a concept, prism, which ingludes bricks and wedges. It induces that the 
lintel is a prism. If it later saw an example with a'pyramid as the lintel, it would generalize still 
lijrther, since a pyramid Is not a prism. The learner is biased. It is biased toward the most specific 
generalization that covers the examples. It won*t generalize unless it has to. Until it sees a pyramid 
as the lintel it will stick with prismatic lintels* 

An important distinction is the difference between positive and negative examples^ A positive 
example is an instance of the generalization being taught, and a negative example is not an instance 
of the generalization being taught. In the arch*leaming illustration, scenes a and b are positive 
examples^ and scenes c and d are negative examples. The teacher tells the learner which examples 
are positive and which are ncjgative. Winston's program made crucial use of near misses, negative 
examples that ar^ almost instances of the target concept Scene r is a near miss. The only thing 
that prevents r from being an arch is the fact that its legs are touching. Scene d is not a near miss. 

Scene ^ is a near miss that raises an important issue. It was just mentioned that Winston^s 
program had a conservative bnas with respect to positive examples. It only generalized if it had to. 
With respect to negative examples, the program has the opposite bias. In near mibS e^ two relations 
are missing. The left leg is not supporting the lintel, ^d the right leg is also not supporting the 
lintel Winston^s inducer decides that both these support relations are necessary parts of the 
generalization, A more conservative learner would decide that either left'leg support or right-leg 
support was necessary, but it wouldn't require that both be present for a structure to be an arch, A 
conservative learner would accept scenes / and g as arches* but Winston's program would not For 
the conservative learner to learn that both IcfWeg support and right-leg support were necessary, it 
would have to be given both / and g as negative cxpmplcs* So> WinsK>n*s learner has two biaixs: 
conservative for positive examples, and liberal for near misses. It is important to understand what 
the biases of an inducer are since they can bo critical In making it learn like a human. 
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Figure 2p-10 

Some examples, negative examples and near misses of Winstonian arches. 



The concepts of positive and negative examples, near misses and bias have been introduced* 
Another key concept is the class of all possible generalizations that the learner can induce. In most 
cases, the class of all possible generalizations is determined by a representation language: any 
expression that can be constructed in the representation language is a possible generalization. 
Usually, the class is inHnitely large* llie arch learner's representation language is a certain Kind of 
semantic net language. It uses about a hundred primitive predicates, such as ( ISA x ^WEOGE) and 
(SUPf^ORTS X y). A critical feature of the language is that it does not have disjunction. It has no 
way to say, for instance, that the lintel is a brick or a wedge. Of course, the language could easily 
have disjunctions added, allowing it to say 

(OR (ISA X 'BRICK) (ISA x 'WEDGE|) 

OT perhaps 
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(ISA X (ANY-OF 'BRICK 'WEDGE)). 

However, Win^n chose not to allow disjunction in Uic language in order to constrain the class of 
atl possible generalisations' thereby contn>lling the inducer in an indirect way. 1liis indirect 
influence plays a tacit role in the treatment of scenes ^ and b of figure 2-10. Although the arch 
learner is biased to Uike die most specific generalization that covers the examples, it is not allowed 
to induce that tlie lintel is a brick or a wedge. The representation language forces it to genemli/.e 
slightly beyond the evidence and decide that tlie lintel is a prista Hence, it recognises scene h as 
an arch, even though it has never seen it before* because the iintel is a prism, a trapezoidal prism, 
if the language permitted disjunctions, the learner would not make the inductive leap from "brielc 
_0x wedge" to "p rism" and hence would not re cognize scene h as a" areh. In short, the constraints 
on the class of all possible generalizations, which arc usually determined by the representation 
language, exert a crucial control over the character of the indueer^s learning. To reiterate; the two 
major determinants of induction arc the biases of the inducer and the constraints on its possible 
generalizations. 

The ideas that have been introduced can be summarized as follows: The input to an inducer 
is a sequence of examples^ the output is a set of expressions in some representation language such 
thai (1) each expression is consistent with all the examples, and (2) the set of expressions is maximal 
respect to a certain partial order, called a bias {or a simplicity metric, or weighting function), lliat 
is, the output consists of the simplest expressions in the representation language that are consistent 
^^ith^L theL-Cxamples. _ 

The definition of consistency and bias varies with the task. For instance, suppose the task is 
to induce grammars from strings. The examples are strings, and the task is to induce expressions 
(grammars) in some specified grammar- representation language. A grammar is consistent with a 
string if the grammar parses it (or more fbrmalty^ the string is in the language generated by the 
grammar). A typical bias is to prefer simple grammars (e.g., fewest rules, or fewest non-terminal 
categories). A bias based on counting rules or categories would be a total order, since any two 
grammars can be compared In general, biases are partial orders rather than total orders. For 
instance, suppose the bias is to prefer grammar A over grammar B whenever A's rules are a strict 
subset of B*s rules. This means that certain pairs of grammars will be incomparable; neither may 
be a subset of the other. When the bias is a. partial order, more than one grammar may be 
maximal. That is why the output of an inducer is defined to be a of expressions, not just a 
gngle expression. A last comment is that ttas is applied cjier consistency, so to speak. In effect, 
the inducer first Bnds all abstractions consistent with the examples, then it Hnds the maximal 
elements of this set. 

Negative examples and discrimination examples 

Negative examples are very importanf. The previous discussion of near misses indicated how 
important they vi"ere for Winston's inductive learner* Another important use of negative examples is 
to recover from overgeneralizations* Suppose' the target concept is more specific than the concept 
that the inducer has at the moment (ag^ the target is PRISM but the inducer has guessed BLOCK). 
A negative example can be used t. force the inducer to make its guess more specific (eg,, showing 
the inducer a negative example that i^s a bloclc and not a prism). No positive example could force 
the inducer to retreat from the overgencfalization in this way (Gold, 1967)* A critical issue for this 
theory is whether instruction in mathematical skills uses negative examples, and if so^ how. 
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Strictly speaking, a negative example i>ra mathematical procedure is a problem state sequence 
that illustrates an incorrect way to solve the problem. In a textbm>k, such a worked exercise might 
be labelled, "lliis is a wrong way to do siiblnictiou problems* Do not.use this way.** \ have never 
seen such examples in textbooks. However, negative examples do occur in classrooms under several 
circum^stanccs. When a teacher has the class solve a pmblem on the blackboard, incorrect problem 
stiUe sequences will sometimes be generated (or partially generated before the teacher stops the 
student solving the problem), "l^hese incorrect solutions arc negative examples, "tliey may even 
Qualify as near missel A similar situation occurs when students arc doing seatwork. Students 
having diRiculty often ask for the teacher*s help. The teacher may watch them solve a problem, 
then point out where the student went wrong. Such incorrect solutions also serve as negative 
examples. So, negative examples are not absent in normal instrucUon. But they arc not common, 
"and^^lhey^KTii'OirTrsed in any ircthodical way. 

There is another kind of example which ftjnetions as a negative example in certain ways, 
although it is noti properly speaking, a negative example. Solving a problem that doesn't require a 
certain subprocedure provides a negative example fbr induction of the subprocedurc. 1 believe the 
traditional .name among curricula writers for such examples is discrimimtion examples. A 
discrimination example is one that demonstrates when not to use the subprocedurc that is being 
taught in the current lesson. For instance, an example that doesn^t borrow is a discrimination 
example when it dppeai^ in the midst of a borrow lesson. Such an example can help the inducer 
discriminate the conditions that determine when one should borrow by providing negative instances 
of borrowing (i.e*i problem states when one should ^^ot borrow). With regards to the induction of 
the test pattern that governs a new subprocedure, negative instances act as negative examples. So a 
discrimination example provides a negative instance of a certain subprocedurc's test paaem, but it is 
not a negative example^ 

At Hrst glance, it seems that some textbooks provide discrimination examples and some don^t 
This is rather odd. If induction is indeed what students do, and given that induction can proceed 
more erfieiently when negative instances are available, then it is amazing that some curricula omit 
discrimination examples. A closer examination of the textbooks in question reveals that th^y 
actually do. have discrimination examples. However, the discrimination examples for a certain 
subprocedurc do not occur in the introdtictory lesson on the subprocedure. Instead, tHey arc placed 
later. Often they appear in rcviev/ lessons. Another place they ^ear is in lessons of subsequent 
subprocedures. For instance, discrimination examples for simple, non^zero borrowing arc provided 
by the examples used to introduce borrowing fh)m zera In 

5 9, 

-238 
3 6 9 

subtracting the tens column always provides a negative instance fbr borrowing. By the time the 
solver gets to processing it> the tens column^s top digit has been changed to 9, so the column never 
requires a borrow. This can be used by the inducer in inferring that Y<B is the correct condition 
for borrowing. So an ordinary positive example of borrowing-fh)m-zero necessarily provides a 
discrimination example for the subprocedure of borrowing. In certain cases, the same example can 
be both a positive example and a discrimination example for a certain subprocedure. In short, it 
spears that discrimination examples arc present, one way or another, although they may occur late 
in the lesson sequence. 

At the present time, Sierra^s learner is not able to use discrimination examples for a 
Subprocedurc unless they occur in the lesson where the subprocedure is introduced. Consequently, 
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v^licn a icxibook^s content is fbntiali/cd as a Ics&on sequence, discrimindlioii cxjitiples are moved 
fbrv^ard in llic sequence. For instance, example b in figure 2-4b is a discrimination exjnipie lliat is 
not in the coiresptmding real lesson. It came from a review lesson that appears j tittle later in tlie 
textbook. Appendix 7 discusses how this technical limitation will be removed in later versions of 
Sierra, 

2.7 riie learner 

A typical inducers task is completely specified by (1) the representation language, (2) the 
definition of cv/fsistency. and (3) ihc definition of bias. Sierra's le^u-ner is atypical in that its 
specificati on involves one further gjnstraini: it may add at most one subproccdure per lesson. This 
constraint is a fimrth kind of constraint. It is essentially an upper bound on the rate at which the 
inducer may change its candidate generalisoitions. 

As far as f know- Sierra^s learner is tlic first Ai inducer to use a rate constraint. Rate 
constraints might be profitably exploited in other appliealions. sueh as the knowledge acquisition 
phase of knowledge engineering. In fact, a quick survey of the knowledge acquisition literature 
reveals an amusing "hole." There is a great deal of complaining about tlie so-called knowledge 
Ticquisition bottleneck. It is hard to get human experts to formalize their expertise as e.g., 
production niles. One often heard solution is to have the system learn ihe knowledge on its own, 
e.g., by discovery Or by analogy. However, few human experts acquired tl^eir knowledge this way. 
Most of them didn't discover their knowledge or infcr it^they learned it_in school or from a me ntor^ 
The "hole" in the knowledge acquisition research is that no one, to~my knowledge, is Trying to gel 
their expert system to learn like human experts leam. Such a system would take advantage of the 
structure that its mentor places on the instruetion. The present research^ in its explieation of felicity 
conditions, should be helpftjl in building such a knowledge acquisition system. Presumably, such a 
system will be easier for human experts to educate than present systems. Because many experts are 
experienced teachers, they are more familiar with formatting their knowledge as lesson sequenecs 
than as pixjduction rules. Felicity conditions might help solve the knowledge acquisition bottleneck, 
Alas, this research is not aimed at such practical {and potentially luerative) goals. Its aims are 
merely scientific. 

RepresenUtion language, the first of the constraints on Sierra^s learner, has been defined 
already. Tliis section is devoted to defining the others. They are discussed in the following order: 
eonsistency, subprocedure and bias. The aetual algorithm used to implement the inducer is not 
discussed here because any algorithm that meets the spccifieations would do jitst as well from the 
standpoint of the theory. 

The definition of consistency 

A procedure is consistent v^ith an example if and only if its solution to the example's problem 
is exaetly the same problem state sequenee as the example itself, This definition captures a 
controversial felicity condition. Teachers guarantee that any procedure tliat always produces a 
correct problem state sequence v^ill be acceptable, ft matters less what students say or thinlc; they 
are evaluated on what they do. Consequently, in order to succeed in school, students need only 
induce a procedure that is consistent ^ith respect to the problem state sequemes of the leaehers* 
examples, ft's raUona! that students would take the simplest, most efficient road to success. In faet, 
tlley do. fnduetion from problem Slate sequences is just what students seem to do. The felidty 
condition captures this whole complex: the teaehers* guarantee, the way the guarantee simplifies 
learning, and the fact that students actually take advantage of the guarantee by using the simplified 
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way to learn. I1iis Tclicity condition is labelled the induction hypothesis in chapter 3, where it is 
fbmially defined. 



The deftnition of subprocedures 

'ITie maximiiiti amount of material diat may be added to an AOG by die learner during a 
lesson is eallcd a subprocedure. In Lisp terms, a subprocedurc is like one clause from a cOnD 
statement: it's a new conditional bmneh diat a>nsists of a sequence of several steps, where cael? 
step calls existing code. If procedures are presented as augmented transition nets or atns (Woods, 
1970; Winston. 1977). dien a new subprocedurc is a new arc and a new level that is ealled by die 
new arc. (see figure >-ll). In AOG terms, a subprocedure consist of several components: 
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Figure 2-11 

The procedure of figure 2-12b drawn as an aTN. where arcs run left to right 
The new subprocedure's arcs are shown with double lines. 
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1. a new rule, called the adjoimng rule, that is added lo an existing OR goal called the parent OR. 

2. a new ANt^ goal, called the AND. The adjoining rule calls it 

3. One or more new OR goals, called the irhictl ORs. "llic trivial ORs are called by the new ANlVs 
rules. Haeh trivial OR has a single trivial (i.e., null pattern) rule that ealls some existing AND goaL 
These existing AND goals are called the kids. 

Figure 2-12 illustrate tlicse components of a subprocedure by showing an aoc before and after a 
subproccdure has been added. Hiis subprocedure, by the way, is the one acquired from the lesson 
of figure 2-4. teaches how to borrow f^om zeros. It will be used throughout this section as a 
running example. "ITic prc-lcsson aoG (figure 2-I2a) can borrow only from nonv^ro digits; the 
post'lcsson AOG (figure 2^12b) can borrow across ?.cros. BORROW/FROM is the subprocedure's 
parent OR. The adj qj ninfi rule connects BORROW/FROM to BFZ. The new AND is B FZ . The trivSl 
ORS arc l/BFZ and 2/BFZ. ITie kids are REGROUP and OVRWRT, 

The reason sMbproccdures have the particular structure that they do is the subject of lengthy 
argi|mentation. which will be presented in following eliapters. To summarize that argumentation, a 
certain felicity condition, one'disju net- per- lesson, mandates that just one new choice be introduced 
into the procedure's structure. This choice is created by adding tile acUoinlng rule to the parent OR, 
ITiis means that there is now a new way to achieve that goal. A choice has been added. For 
convenience, all places where there could eventually be choices^ but so fer there are none, are 
marked structurally. Thus, the subgoals of the new AND are created as trivial ORs. Trivial ORS 
provide a place for later subprocedures to attach. In fact, this subproccdure^s parent OR, 
BORROW/FROM, was created as a trivial Or for REGROUP,. ' ^ 



The definition of bias 

Given a lesson and a procedure, the learner first generates all possible subprocedures that 
make the procedure consistent with the lesson's exampleS> then it uses bias to define the maximal 
subprocedures. (Actually, the algorithm is more complex, but the effect is the same.) Bias is 
defined by several Ordering predicates. Each bias predicate will be stated and discussed in turn. 
A>B will be used K) indicate that the bias prefers procedure A over procedure B. - 

Maximally general test patterns 

ff two subprocedures, A and are equal in every way exc£jjt that A*s adjoining 
rule's test pattern is a subset of B's actjoining rule's test pattern, then A>B. 

The adjoining mle of a subprocedure is an or itile. so it's pattern is a test pattern. \l controls when 
the subprocedure will be executed For instance* in figure 2^2, the actjoining rule connects 
borrow/from to BFZ. ff its test pattern is trua BORROW/FROM calls BFZ; if it is false, 
borrow/from calls a subgoal that simply subtracts one from BORROW/FROM'S ai^gument For this 
test pattern to be consistent with the examples, it should be true in all problem states where 
borrow/from is the current goal and the subprocedure is invoked by the teacher. Such problem 
states are called positive instances. It should be false in problem states where BORROW/FROM is the 
current goal and the teacher did not invoice the subprocedure. Such states are called negative 
instances. Given the les5on of figure 2-4, the positive instances aroustates a and b b^low* and 
negative instances are states c and d: 

1 14 

a, 3 0 4 b, 7 0 7 c. 8 2 4 d. 8 2^4 
'12 6 - 2 8 "568 -358 

6 
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Figure 2-12 

AOGs before (a) and after (b) H*s lesson on borrowing from zero. 
The new subprxedurc's goals are shown in a larger font* 
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'llicrc arc often millions of consistent lest patterns. For illustration of how the bias affects the 
induction of this test pattern^ however, we'll consider just these four patterns, not all of which are 
consistent (each pattern is followed by its l^!nglish translation): 

I- , {) Always true. 

Z {{0 TD)} BORROW/FROM'sargumentisa/^ro. 

3. {(Part! C TD)) BORROW/FROH'sargnmcnt TD is in acphimn. AC, that is 

(Part! AC C)) tidjaccni to tlic rightmost column in the problem, X. 

(Partr G AC)) Iliat is. BORROW/FROH's aigvimeiu TO is in tlietenscolumn. 

{Parti C X)) 
(Adjacent? C AC X)) 
i (Last? G X)}) 

4. j { ( 0 TD ) ' BO RROW/ F ROM*s ai^gument TD is a zero, and 

(Part! C TDU it is in.a column, AC, that iF " — 

\ {Parti AC ' adjacent to the rightmost column in the problem, 

1 (Parti G aC)) lliat is, TD is a zero andil is in ihetenscolumn. 

I (Part! G X)) 
- . -{Adjacent? G AC X)) 
j (t.ast? G X)}) 

Fatter|i 1, the trivial pattern, is true of both positive instances (indeed, it is always true). But it is 
not fa^sc of the negative instances. Hence it is not consistent witli the examples. Pattern 2 is true 
of both positive inst**nceG and false of the negative ones. It is consistent. Moreover, since the only 
pattern that' could be a proper subset of it is {), which is inconsistent, pattern 2 is maximally 
general. It is accepted by the bias. Pattern 3 is true of the positive instances but it is not false of 
One of the negative instances (c). Hence pattern 3 is inconsistent When it is conjoined with 
paucr^ 2, the result, pattern 4, is consistent. But it is not maximally general because it contains 
pattern 2 as a proper subset, and pattern 2 is consistent. Pattern 4 is not accepted by the bias. The 
bias prefers pattern 2 instead. To put it intuitiyely> pattern 4 would represent students who believe 
that they should only borrow-from-zero for zeros that are in the tens column. Such a ijelief would 
appear in the students* work as a bug. But no such bug has been obsen'cd. In order to account for 
this fact and many others, the theory adopts a bias toward maximally general test patterns. We 
move On to the next blias predicate. 

Lowest parent 

Given twosubprocedurcs.A and B, for possible addition to a procedure P, if A is feyver 
than B in that there is a path from Fs root to A that passes through B, then A>B. 

The best way to understand the lowest parent bias is to see an example. Figures 2-12b, 2*13c and 
2-13d show three AOGs corresponding to adding different subprocedures to an initial AOO. The 
initial AOG is shown in figure 2"12a. All three procedures are consistent with the lesson's examples. 
However, subprocedure 2-13c*s parent, SUBICOL, is higher than subprocedure 2-12b*s parent, 
BORROW/FROM. Subprocedure 2-13d's parent, 1/SUB, is higher still. The lowest parent bias 
prefers 2-12b. Essentially, 2-12b represents the idea that the new subprocedure, BFZ, is a kind of 
borrowing. 2-l3c represents the idea that BFZ is a way to process columns (i.c., there are three 
kinds of columns: easy, non-borrow columns; harder, borrowing columns; and super hard, borrow 
fVom-zero columns). 2-13d represents the idea that BFZ is a way to process a whole problem. (i.e., 
there are two kinds of problems: regular problems and problems that require BFZ.) 
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Figure M3 

SubprocedurcSp in large font, that are attached to higheT parents. 
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'rhcrc is a somewhat dirfcrcni perspective on the lowest parent bias: the lower the parent, the 
more general the subproccdure. For instance, 2*l2b and 2-13e can cope with four column borrow- 
(mm^'mo problems, such as a below, 

a. 5 0 7 2 b. 5 0 0 2 

-119 1 -1119 

^ 

but 2--13d can not Because its subproceduix; is attaehed above MULTI, the loop across columns. 
2-'12b can borrow from multiple zeros, as in b, but neither 2-13c nor 27l3d can. So the lowest 
parents bias is a bias toward inereasing the applicability of the new subprocedure. 

This bias has a certain elegant relationship to the test pattern tnas. Note, first of all, that test 
patterns and parent ORs arc two aspects of the same thing: The test pattern expresses external 
conditions on when to call the new subprocedurc, and the parent OR expresses internal eonditions 
on when to call it ix,, what goal must be currcnt ia^order to call it (In ordinary production 
systems, test patterns and parent ORS would be syntactically indistinguishable because they would 
both be conditions in the left-hand side of a nile.) Given this duality, their respective biases opght 
to be the same, and they are. The test pattern bias is toward maximizing the applicability of 
subprocedure. The lowest parent bias is also towards maximizing applicability. To put it a little 
differently, these two biases both say that students woitld prefer to risk errors of commission (i.e., 
executing the new subprocedurc when it realiy shouldn^t be executed) rather than risfc errors of 
omission. On to the next bias predicate. 

Maximally specific fetch patterns 

fftwo subprocedures, A and B,are identicalexcept for tlieir fetch patterns, an dieach 
fctch pattern on A*s new and*s nilcs is a superset or equal to the corresponding fctch 
pattern in B, then A>B. 

This bias prefers the largest, mi. , ^^ific patterns for fetch patterns. This makes sense, given the 
role that fetch patterns play, Thw basic problem that a fetch pattern solves is deciding which of the 
many visible objects (e.g.« which digit or which column) a subgoal should use as its arguments. In 
order to maximize the fetch pattern^s power to discriminate, the learner remembers everything about 
the Icsson^s examples that might prove uscftjl in fetching — it remembers maximally specific 
patterns. It does so in order that problem solving can approximate the lesson situation as dosely as 
possible. If some idiosyncracy of the lesson^s examples is stored, no harm is done. Although the 
idiosyncratic relations won*t match during problem solving, feteh patterns are matched closely rather 
than exactly, so the fetch wiH-sticceed anyway. 

Besides inducing patterns, the learner builds actions for each of the new nilcs. This 
sometimes involves inducing nests of ftjnctions. Functions arc tjpically needed whenever the 
woikcd example introduces a new number, a number that is not equal to one of the numbers 
visible in the current problem state* These new numbers are usually the result of some facts 
ftjnction that is performed invisibly by the teacher (or textbook), fn the first example of the lesson 
(see figure 2-4a), a 9 is introduced m problem state d Some possible candidates for the function 
nest that generates the 9 follow (English presentations have been substituted for the pattern 
variables that would normally appear as the arguments of Read): 
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1. (QUOTE 9) 

2. (Add (Read <QnginaItopdigitof hundrcds>) (Read <boUoniof umts>) ) 

3. (Subl (Add (Read <toportcns>) ( Read <origmaUop ortcns>) ) ) 

4. (Subl (Read <top oftcns cokmin>) ) " 

0 

Function nest 1 is a constant. It turns oiit to be a eorrcct fltnetion nest. Function nest 2 will be 
filtered out by the lesson's next positive example. 70^-2^8, because 7 + 85*>9. "Hie third nest will 
never be filtered out by any example since tlic subproccdure is only called when the top of the tens 
column is /era. However, this function nest is nited out by the show*work principle. The show* 
work principle is a felicity condition that states that examples of a new subprocedurc are expected 
by the stuXlent to show all their work. ' What this means* in practice, is that facts functions won't be 
nested by Sierra's learner. To do so. as in the third nest above, is to hide an intermediate result 
instead of writing it down. Apparently, students don't believe that the teacher will do that, so they 
never bother to consider nested facts functions.* The fourth fttnction nest is logically equivalent to 
the first - It too is consistent with all the lesson's examples. However, the use of Subl instead of 
the constant 9 has a subtle effect on local problem solving, which allows one to detect which one 
students prefer. They prefer the constant The following bias expresses that preference and others 
like it: ^ 



Smallest ariiy 

If two subproccdures, A and B, are identical except for a function nest, and thcarity of 
A's nest is smaller than the arity of U'snest, then A>B. where the arily of a function nest 
is the sum of the number of argument places in its functions (i.e..conscantsandnuIlary 
funetionseountO. unary functionseount 1. binary functionseount2,etc.). 



This is a rather minor bias that has a clear intuitive interpretation. Suppose that executing binary 
facts functions requires greater use of cognitive resources than executing a unary facts timction. 
The bias thet\ means that students prefer fijnction ^ncsts which reduce their cognitive load during 
execution. TTicre are just two more >bias predicates left to discuss. 



Fev^esit kids 

If two subprocedures, A and B, have the same parent OR* and A has fewer kids 
than B, then A>B, ' 

Lowest kids 

If two subpnKcdures, A and B, have the same parent OR and the same number 
kid^* and each of A's kids is lower tijan or equal to the corresponding kid in B, 
then A>B, where "lower'' is defined as in die lowest parent constraint. 



These last two biases were discovered by trial and error. Although they are needed in order to 
improve the theory's predictions, I have, as yet, only a speculative interpretation for them, which is 
"discussed in chapter 19, What makes these biases confijang is that they arc opposing biases. 
Chapter 19 shows how the fbwest-kids bias increases die genera&ty of the subprocedurc while the 
lowcst-kids bias decreases it* 



* We can infer this by relaxing the show-work principle and seeing if the resulting predictions are 
accurate. If the showwork principle is relaxed slighUy so that facts functions can be nested one 
deepi then approximately 450.000 distinct function nests are induced for this lesson. Many of than 
lead to star bugs. 
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A summary of the Ieanier*s operation 

Ml the criteria for tlie learner have been defined. iTie representation language and 
consistency were defined. Rate constrainLs were defined by defining subprt)cedurcs. The biases 
were defined. Any algorithm that satisfy these specifiCtitions would serve <tsan implementation fbr 
Sierra's learner*. Sierras actual algoritlim is not far fi-om a brute force i^lgorithin. However it uses 
some tricks to reduce the computational resources required. For instance^ it uses a version space, to 
represent the set of all consistent test patterns (sec section 18.1 and Mitchell, 1982). 

It was menti9ned earlier that Sierra's learner has a second, minor component, called the 
deletion unit, that deletes one or more niles fi'Om the subprocedure(s) produced by the inducer, 
llie deletion unk*s operatiot. is quite simple. Suppose the inducer has just produced a 
subprocedure whose new and has n niles. 'ITie deletion linit produces 2"-2 new subprocedures. 
One for each non-trivisl subset of the AND niles* If the new AND has two rules, Ri and K2^ then the 
deletion unit produces two new subproccdures. One h^s a new AND with just rl llie other has a 
new AND with just Rl Chapter 7 discusses why deletion must be a part of the learner. The bottom 
line is that several observed bugs can*t be generated without it. 

18 0)rc procedure trees for the Southbay experiment 

In Older to illustrate the way the learner's inducer works and to start the discussion of 
observational adequacy, this section illustrates the inducer's penurmance when given a particular 
Subtraction lesson sequence, the one called H in sccjion 2.2. Sierra's inducer is one-to^many in that 
it may produce more than one output procedure from a single input procedure and a lesson. This 
comes out elearly in figure 2-14, which diows the core procedure tree for tl^e learner. The core 
procedure tree shows which procedures are derived from whieh other procedures. The initial 
procedure is at the top. It is called '*le** because it ean only do one*coIumn problems. The links in 
the cjre procedure tree arc labelled with the lesson names* Thus, lesson produces procedure 2c- 
ftjll. The remainder of this section is a "walk** down the core procedure tree. 

The first lesson, L^, teaches how to solve problems of the fbnn NN-NN. The resulting 
procedure, 2c-full, can do two column problems, where both columns are "full.** The new and of 
the subproccdyre introduced by this lesson is labelled HULTI jn figure 2-6, which is the AOG from 
the procedure labelled *'ok" in the core procedure tree. Henceforth, the new aNd*s names will be 
indicated in square brackets so that the reader can follow along in figure 2-<^, Lesson teaches 
how to solve incomplete tens columns, producing a procedure called 2c that can do any two column 
problem that docs not require borrowing [SHOW]." tcs^on Lj introduces regrouping offline, so to 
speak (REGROUP]. It examples that are not subtraction problems. The subtraction procedure 
that results from this lesson, 2c-regroup, can do both regrouping exercises and two-column 
subtraction problems, but it cannot do two-column subtraction problems thai, require borrowing. 
That capability is caught by lesson L^. 



* There arc interactions among the biases, so they must be applied in the following order: lowest 
parent, fewest kids, lowest kids> smallest arity, and maximally specific Fetch patterns. Th? bias for 
maximally general test patterns must be applied after the lowest parent bias, but it is independent of 
the others, 



52 



Model 



L5 



L6 



L7 



L8 



L9 



3c-fun 



3c 



3c-lbor 



3c-Jbor 



L1 


1 




L2 




full 






2 


C 


L3 







L4 



2c- regroup 



2c-lt)or 2c-lbor-lefl 

-J- 

3c-full-Blk 3c-full-P100 




3c^ 



PlOO 



3c-lbor-P100 
3c-2jr-P100 



3c-bfl2 3c-briid 3c-bal2-Blk 3c-balid-Blk 3c-t)al2-P100 3c-bal1d-P100 
Lioj 



1 



Ok-Blk 



Bgure 2-14 

Tile core procedure tree of lesson sequence H, with lesson labels on the left 



Lesson integrates ^grouping into the column-traversal algorithm [SORROW]. Notice that 
there are two output procedures* 2C'Ibor and 2c-Ibor^leq. part of what lesson teaches is yihm to 
borrow. It uses examples like 34-18 to show when to borrow (positive examples), and examples 
like 34-13 to show when not to bonow (discrimination examples). However, lesson does not 
include examples like 34-14, where the units colurin's digits are equal. Hence, the learner has no 
way to telt whether the test for borrowing should be T<B or T<B. Sierra's learner thus produces 
two procedures: procedure 2c-Ibor borrows when T<B, and procedure 2c-Iborleq borrows when 
T;<B. This constitutes a prediction that some studenis will borrow when T=B, as in 34-14. This 
is a correct prediction. Th,. 'lorresponding bug, which is called N-N;Causes-Borrow, has been 
observed 

Lesson L^ teaches how to solve three column problems [recursive call to MULTI]. It 
produces three procedures. Procedures Bc'full and 3c'fulbBlk are almost Identical. The only 
difTeience is th'^ test pattern that they use to tell when to recurse. For procedure 3c-full, ^he test is 
whether the current column is the tef^ost column in the problem; if it is not, then the procedure 
recursbs. For 3c*full-Blk, the test is whether there are any unanswered columns left Both 
procedures lead ultimately to correct subtraction procedures (the ones labelled ok and ok-Blk). The. 
intermediate "Blk" pnjcedures, however, generate star bugs. Certain repairs, which attempt to omit 
answering a column> will cause these Blk procedures to go into an infinite loop trying to answer the 
column that was left blank. .This whole branch or Blk procedures shduldr't be generated. In a 
moment, the underlying problem will be discussed* The third procedure output ftom lesson L^ 
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3c-fu)1-P100, passes arguments lo the rccui'sive call a litite strangely. Whereas both 3c-fuII and 3c* 
ftjMlIk pass the rightmost unanswered column ilirough ilie recursion, 3c-full'P100 passes the 
leftm<Kil column (hence its .suffix, VlfX). which <ibbrcvialeh 'p^issing the hundreds cohimn"). Uliis 
tjniy works correctly for iliroc-cohimn prnblems. Procedure 3c-fu11-lM00 will get stuck if it is given 
a four-coUimn problem^ In fact, it and all its descendenlb are star procedures due to the strange 
vL iys that tliey answer pmblems with four or more culiiniiis. 11iis branch t)f the core procedure tree 
shouldn't be genemtcd. In chapter 10 exjmine^ the pruWeiii underlying the Ulk branch and the 
PlOO branch. "ITie btame is laid on a missing piece of ammon sense knowledge. The theory 
should provide some schema for recognizing and building loups as Iterations across ''similar" objects 
in a problem state, lliat is. the knowledge representation language should have, in addition to and 
goats and OR goals, a Forcach goal. Such a goal executes its body once for each object in a 
sequence of objects. e.g.. each column in a sequence of columns, 'llie current lack of such a goal 
forces the learner to build a recursion in order to traverse columns, and this causes the Blk and 
PlOO groups of star bugs. 

Lesson teaches how to solve three column problems of the form NNN-N [SH0W2]. This 
lesson is a fabrfcation. At about this point in tlie Heath textbook. NNN-N problems start 
appearing in the practice exercises, but there is no specific lesson on the subskill. Lesson L6 has 
been included in the lesson sequence in order to get around the missing Poreach loop problem. If 
column traversal were structured around a Foreach loop, then the lesson that teaches how to solve 
NN-N problems (lesson L^) would suffice to teach how to do a partial column that occurs 
anywhere in the problem. Since there is no Forcach loop in the current knowledge representation 
language, omitting L6 from the lesson sequence means that all the procedures generated from H 
will be unable to solve NNN-N problems. In particular, all the procedures will manifest one of 
the two star bugs shown below when they are run through the solver: 

7 9 

•Skip-InlerionBotiom-Btank: 3 4 6 3 4 6 8 0^7 

z 2 ■ 2 2 9 

33X 323V 78X 

•QuibWhen'Second-Bouom-BIaiiJ:: 3 4 6 

Z 2 

3 X 

(In this and following examples. X marks wrong answers and V marks correct ones.) In particular, 
the learner will be unable to generate a correct subtraction procedure. The proper way to avoid 
these star bugs would be to study the Foreach problem, then make the appropriate changes to the 
representation language. I haven^t done that yet In the interest of seeing what the theory would 
generate if that were done. L^ was added lo H. Lesson sequence SF does not have such a lesson. 
All its predictions involve one of the two star bugs above. 

Lesson teaches how to solve throe column problems when one of the columns (but only 
one) requires borrowing. This lesson refines the fetch pattern that determines where to do the 
decrement during borrowing. Prior to U. the fetch pattern would return both the left-adjacent 
(tens) column and the leftmost (hundreds) column for borrows that originate in the units column 
(recall ihe discussion of the bug Always-tJorrow-Left in section LI). This less n modifies the fetch 
pattern so that only the left-^djacent column is fetched. All the previous lessons have added new 
subprocedurcs; lesson does not It only modifies existing material. Lesson Lg is similar. U 
teaches how to do problems with two adjacent borrows. It does not add a new subprocedurc, but 



7 9 

3 4 S ^ 8 0^7 

" 2 2 z 9 

3 2 3 V 8 X 
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only adjusts some fetch patterns. It produces a procedure 3c-2bor (or 3e-2borBIk, or 3c-2bor-P100) 
that can correctly solve any three column problem that does not involve borrowing from zero, 

lesson I-^ teaches how to borrow from zero (8FZ]. It produces two procedures that are 
identical execpt for tlieir test patterns, llie test for 3c-bfl/ (or 3c-bnK-illk, etc,) is {(0 X)}^ which 
causes the pnf>ccdurc to borrow from zero if the digit to be decremented is a xcpj. 'llic test pattern 
for 3c-bflid is {(ID/ELT X)}, which makes the procedure borrow across Keros and ones. ID/ELT 
is a categorical relation that is defined by the grammar (see figure 2-7) to be tnie of both kinds of 
identity efcmenis. Procedure 3c-bflid corresponds to an observed bug, Borrow-Treat-One-As-Zcro. 
The reason the learner produces two prtKedurcs is that the lesson is missing a cnieial example, one 
where the digit to be borrowed from is a one (ag., 514-9). Without this example, tile learner can't 
discriminate which of the two possible test patterns is right. By the *vay. this illustrates one of the 
few ways that the grammar has been tailored in order to improve the theory's predictions. W 
ID/ELT were taken out of the grammar, then this bug could not be generated. In general. Sierra's 
predictions arc not particularly sensitive to the grammar. If the grammar works, in that it provides 
eorrcct parses for all the problem states that the procedures proditce, then the model generates 
about the same set of predk;tions. The ID/EIT ^case is the exception to this general finding. 
Anyway, the learner finishes up by taking ler-on Lj^j, which teaches how to borrow across multiple 
7^ros. 'Iliis lesson has no effect on 3c-bflz, since the procedure can already do thaL 

The core procedure tree has two 2-way branches and one 3'Way branch. It could have as 
many as 2x2x3 = 12 final procedures. In fact, there are just 2. The other branches are pruned 
when the learner is unable to assimilate the Pext lesson. For instance, the branch fi)r T<B as 7he 
test for bormwing (i,a, procedure 2c-lbor'leq) is terminated at lesson L5 because one of the 
examples in that lesson is 

9 8 5 
- 6 2 5 
3 6 Q 

The procedu'XJ expects the units column to have a borrow, but the example docs not have a borrow 
there. The learner could install a new subpocedure that would avoid borrowing whenever T=B, 
However, lesson L5 is already introducing a new subprocedure. The fcamer cannot introduce two 
subprocedurcs in one lesson beoiuse that would violate the one-disjunct*per*Iesson felicity condition. 
So this branch of the core procedure tree is pnined. Inuiitivdy, such pnining represents 
remediation^ 

It might seem that the modd is doing a very poor job of explaining where student^s bugs 
come from. It seems io explain only two bugs, N-N-Oiuses-Borrow and BorrowTreat-One-As* 
Zera This is no great feaL Any inductive account of learning could explain these two bugs since 
their "causes" lie in the absence of certain cnicial training examples. However, the real test of the 
learner is not what bugs it produces directly* but what structures it assigns to tlie procedures that u 
produces* A procedure*s structure has a direct affect on deletion and local problem solving. By 
examining the bugs produced by the solver, one can ascertain whether the procedure's structures are 
plausible or noL 
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19 The solver 

Each of the procedures output by the leamer is given, one by one, to the solver. Tlie solver 
"takes" a diagnostic test by applying ttie procedure to solve each problem on the test. The solver 
has two parts, called the Inferpreicr and the local problem solver. 'ITie interpreter executes the 
procedure. The local problem solver executes repairs whenever tlie interpreter's execution is halted 
by an impasse. To describe this in more detail: When the interpreter reaches an impasse, the local 
problem solver selects one of a set of repairs. Applying the selected repair changes the internal 
state of the interpreter in such a way that when the interpreter resumes, it will no longer be stuck. 
^The local problem solver may (or may not) create a patch, which will cause the same repair to be 
chosen if ever that impasse occurs again. Stable bugs are accounted for by creating and retaining 
patches for long periods: bug migrations result from short-term patch retention, lly systematically 
varying the choice of repairs and the use of patches during repeated traversals of the tesU the set of 
all predictions that can be generated frm the given procedure can be collected. 

■ This section describes how the interpreter and the local prQblem solver work. For 
illustrations* it uses the procedure whose aOC is sketched in figure 2-12a. The procedure is called 
3c*2bor in figure 2-14, It doesn't "know" how to borrow from zero. It can solve problems like a 
or 6, but not ones like c: 

a, 2 3 b, 4 5 1 c. 5 0 7 

- 1 7 ^-87 - 2 8 
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This section is constnicied as a scenario that traces the execution of the aOG on problem e. The 
AOG reaches an impasse when it tries to borrow from the zero. Local problem solving repairs the 
interpreter's state, allowing the interpreter to finish the problem. Depending on the repair selected 
by the local problem solver, various bugs arc generated. 

By convention* AOG*s are started by calling their root goal on the initial problem state- In this 
case, STAflT is called with the whole subtraction problem as its ai^umenu START doesn't do 
anything interesting, U merely calls SUB (if addition were part of this procedure's competence, 
START would have to decide whether to call SU8 or ADD), SU8*s purpose is to find the units 
column and pass it to 1/SU8- l/SUB's job is io decide whether the given problem is a regrouping 
problem, a single^column subtractbn problem or a multi-column subtraction problem. It fmds that 
there are several columns lo be subtracted, so MULT I is called with the units column as its 
ar]gument, 

MULTI is an and goal that implements a loop across the columns of the subtraction problem- 
AND goals execute their rules in left*to-right order, MULTI executes its first rule, which calls 
SUBlCOL and passes the units column as its argument, SUBICOL is an OR goal that chooses a 
method for answering its column- OR rules arc tested in left-to-right order. SUBlCOL's first rule 
tests for a blank in the bottom of the current column, which is the units column of 507-28. The 
bottotn of ^ic column is 8, so the test fails, SUBlCOL*s second nile tests vvhethcr the top digit of 
the column is Jess than the bottom digit. Since 7<St the second rule is executed, and BORROW is 
called. 'BORROW i& an and Soai. whose defimJon is: , 

BORROW (T B A) Type: AND 

1, {} > {1/BORROW T) 

2, {} (2/BORROW T B A) 
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As ihc first rule is executed, 1/BORROW is passed the value of BORROWS first argument, whieh 
happens to be the parse node for the top digit of the units eolumn (node 31 in figure 2-8). The 
usual caJi-by-vdlue semantics applies for argument passing. 1/BORROW gets T's vc/ue rather than its 
intension (e.g.* la/.y evaluation* or ealHby-name), The issue of passing intensions versus extensions 
is a complex one. whieh is discussed in appendix 8- 

1/BORROW is a trivial OR goal that calls REGROUP. REGROJP's purpose is to *|regroup" a 
ten to become ten ones. REGROUP is an AND goal that first calls BORROW/FROM and passes it the 
top digit-place in the column that is left-adjaeent to REGROUP*s argument BORROW/FROM is a 
trivial OR goal, whose definition is: 

BORROW/FROH (TO) Type: OR ^ 

1. {) (OVRWRT TD (Subl (Read TD))) 

BORROW/FROM*s ai^umcnt, TD, is curTcndy bound to the parse node for the top digit of the tens 
column (node 27 in figure 2-8). Since the rule's pattern is null, it always matches, ITie action, like 
all rule actions, is an evaluable form, in the Lisp sense. It will attempt to call the subgoal OVRWRT, 
passing it the value of TD and a number. ITic number will be caleulatcd by the ftjnetion nest 
(Subl (Read TD) ). In this case, the ftjnetion nest tries to subtract one from the top digit of the 
tens column, whieh is zero (the problem is 507—28). TryingMo decrement zero violates a 
precondition of Subl. Violating a precondition causes impasse. Local problem solving is 
initiated to find a way to change the state of the interpreter in order to make it continue. 

There are five kinds of impasses. Precondition violations, such as the one just discussed, are 
one kind. For completeness, the other four are listed below, but will not be discussed further: 

1. Halt: OR rules may only run rules that have (a) not been run before in attempting to satisfy 
this invocation of the ZOdi, and (b) have true test patterns. If there are no such rules, then a 
halt impasse occurs. The interpreter can't decide whieh rule to run, so it invokes local 
problem solving. 

2. Ambiguity: The fetch patterns on and rules are used to bind certain pattern variables 
(nicknamed "output** variables) whose values are then used in the rule's action. If a fetch 
pattern matches ambiguously, so that there are two or more values for an output variable, 
then an ambiguity impasse occurs. The interpreter can't decide whieh way to match the fetch 
pattern, so it invokes local problem solving. 

3. Infinite loop: If the interpreter detects an infinite loop (eg., because the goal stack depth 
exceeds some very large threshold), then an infinite loop impasse occurs. 

4. Crazy notation: If the parser is unable to parse the current problem state, whieh means that 
it ts not syntactically well-formed with respect to the grammar, then an impasse occurs. 

Returning to the scenario, figure 2-15 shows the interpreter's state as local problem solving begins. 
The interpreter's state eonasts of a goal stack, and a mode bit called microstate, Mierostate 
indicates whether the interpreter is calling (mierostate = Push), or returning (mierostate = Pop). 
The format of the interpreter's state is important because the Interpreter's state is where the local 
problem solver docs its problem solving. The general idea of local problem solving is that the local 
problem solver can do anything it wants to the interpreters state as long as it leaves the state set so 
that the interpreter will continue. The local problem solver can not ehaiige the problem state (i*e., 
nrite symbols on the page), it can only change the interpreter's state. 
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Microstate = Push 



(BORROW/FROM (CELL 27)) 
(REGROUP (CELL 31) 
h/80RR0W (CELL 31) 

(BORROW (CELL 31)(DIGIT 12)(BLK 34)) 
(SUBlCOL (CELL 31UDIGIT 12)(BLK 34)) 
(MULTI (Cell 3l)(DiGIT 12)(8LK 34)) 
(1/SU8 (CELL 31)(DIGIT 12)(BLK 34) 
(SUB (PROBLEM 53) 

fSTART (PROBLEM 53)) 

Figure 2-15 

The interpreted state — microstate and the goal stack — at the time of the impasse. 
Goal argumen[saresliownas the main category and the serial numberof the parse node(sco fjg. 2-*8). 



Which chaiigcs the local problem solver chooses is left open to individual variation so that the 
modol wiJI capture the faet that dirferent subjects repair the same impasse different ways. (Indeed^ 
the same subject may even repair the same impasse difibrenl ways at dirferent times> an account for 
bug migration.) ffowever, unrestricted changes to the interpreter's state gives the local problem 
solver a great deal of power." It eOuld> for example, run the AOC in some kind of simulation mode. 
The model would thus be able to generate just about anything by hypothesizing the appropriate 
local problem solving. In short, a tricky problem of repair theory is to constrain tlie local problem 
solver is such a way that the theory is refutable* but still empirically successful. The current version 
of repair theory postulates five operators, called repairs* that modify the interpreter state: 

Noop pops the stack. When the interpreter resumes, it will think the top goal has been 
accomplished. Essentially* this repair makes the interpreter skip the stuck goal, turning it 
into a null operation, or "no op" in computer jai^gon. 

Backup pops the stack to the highest (most recently called) OR then sets microstate to Push. This 
will cause the interpreter to choose a different OR rule to call. Put intuitively^ the 
"student" decides to back up to the last place that a choice was made in order to go the 
other way instead. 

Quit pops the stack back to the root goal of the AOG, then sets microstate to Pop. 

Intuitively, the "student" decides to give up on this problem and go on to the next test 
itent 

Refbcus resets the ailments of the top goal in such a way that the precondition is no longer 
violated. It does so by rematching the most recently used fetch pattern. This causes the 
interpreter to execute tlie top goal with different ai^guments, "shifting its focus of 
attention" to avoid the impasse. (Figure 2-16 shows Refocus applied two different ways 
to the impasse currently under discussion.) 

Force has different affects depending on the kind of impasse it repairs. If the impasse is a halt 
impasse, where none of the rules have true test patterns, then the Force repair will pick 
one of the rules and cause the interpreter to execute iL If the impasse is an ambiguity 
impasse, where a fetch pattern matches several ways so that it is ambiguous which values 
to pass as subgoaTarguments* then the Force repair will pick one of the possible matches 
and cause the interpreter to use iL 
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3 3 

4. 4 17 4 17 / >7 >" 1017 

a. 607 /07 /O/ - ^O/. ^O/ yff^/T 

- 2 8 2J ^ 28 :: 2_8 - 2 8 ... - 2 8 

9 9 3 8 9 

17, 17 4 17 4 1017 

b. 6 0 7 6 0 7 6 0^ 6 O^' ifO/T JSj^/l 

- 2 8 - Z8 - Z8 - /8 - if8 - ir8 
^ ^ ^ 

Figure 2-16 

(a) Rcfocus relaxes a relation that says thai the column to borrow from should be adjacent to the 
column borrowed into. ITiis causes the decrement to I? placed in the hundreds column. This 
application of the repair generates a very common bug (42 occurrences in the Southbay data) 
called BorrowAcross-Zcro. 

(b) Refocus relaxes a relation that says that the cell to borrow from should be the first (top) cell in* 
the column. 'ni:s application of the repair generates a rare bug (I occunencc in the Southbay 
data) called Borrow- From-Bottom-lnstead-of-Zero. 



Given these repairs, local problem solving is simple; it is just selection and application of a repair. 
However, this simple regime can generate quite complex bugs. Repairs often cause secondary 
impasses. Since they don*t actually fix the underlying defect in the student procedure but rather 
just get the interpreter running again in the most expedient way, they often leave the problan in a 
state that will cause further in^asses. Repairing those impasses may lead to tertiary impasses. In 
principle, there could be an arbitrarily long causal chain, fn practice* one rarely sees chains longer 
than six impasse-repair occunences, « 

The above description of the local problem solver is a little simplified. There are a few 
complications concerning the creation and use of patches* A patch is an association of a repair and 
impasse that the local problem solver creates in order to cache (save) the results of a particular 
occurrence of local problem solving^ Another complication whose 'discussion will be put off for 
later concerns criik^ which block the selection of repairs under certain circumstances. 

To return to the scenario, suppose that the local problem solver chooses the Noop repair. 
This causes the BORROW/FROM goal to return to REGROUP, having made no changes in the initial 
problem state. REGROUP* which is an AND goal, goes on and executes its second rule which calls 
BORROW/INTO passing it the parse node for the top digit in tie units column. BORROW/INTO's 
definition follows: 

BORROW/INTO (TD) Type; OR 

1- O (OVRWRT TD (Concat (One) (Read TD))) 

This OR goal merely "adds" ten to the given digit by concatenating a one \o its and has 
OVRWRT write the "sum" over the digit In this case, calling SORROW/INTO yields the problem 
state in figure M7b. Control returns to SORROW, popping BORROW/INTO, REGROUP and 
i/BORROW on the way. SORROW executes its last rule, which takes the column difference for the 
units column. Now the problem appears as in figure 2-17c. BORROW is popped, and control 
reurms to MUITL The procedure is done with the units column, U still has the tens column and 
the hundreds column left to do. These are processed uneventfully, with no impasses^ so it is best to 
stop the scenario here. Figure 2-17 shows the remaining problem states between here and the end 
of the Tproblem^s solution. 
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a.6 0 7 
- 28 



b. 6 oi? 
- 28 



c. 6 0^ 

Z 28 

9 



4 17 

Z 2^ 

9 



4 1017 

c. jSSt^ 

Z 28 

9 



4 1017 

Z 28 

8 9 



4 1017 

Z 2j8 

3 8 9 



figure 2-17 

Sequence- of problem slates, omiiiinft en)ssing out actions, for Siops*Uorrow-At*Zero 



This solution to the subtraction problem involved taking the Noop repair to the impasse. The 
solution is eharaeteristic of a eommon bug (64 occurrenees in the Southbay data) called Stops- 
Borrow-At:Zero. Taking other repairs would produce other solutions to the exercise, some of which 
will be bugs. Figure 2-J6 illustrates two bugs generated by taking the Refocus repair instead of the 
Noop repair on this exercise. 

When Sierra is given a diagnostic test and a procedure, it wilf generate solved tests 
eorrcsponding to all possible combinations of repairs to the Impasses it encounters. This varying of 
repairs to impasses is a prolific source of predictions. Figure 2-lS displays this by sketching the 
impasse-repair tree for this procedure when it "takes" the diagnostic test shown in figure 2-1, ITie 
solver reaches its first impasse on the test s 14th problem, 102-39, because the problem requires 
borrowing from a zero. Each of the leftmost branches in the tree corresponds to a different way to 
repair this impasse. The six nodes arc labelled with the impasse: "Pve: Subl Zero?" identifies it as 
a precondition violation impasse where SobTs error test, Zero?, is true when Subl was called. 
The letter following the impasse identification is a code for the repair that was applied: Q for Quit* 
B for Backup. F for Force, N for Noop and R for Refocus. iTiere are always at least fwe branches 
for each impasse because there arc five repairs There may be more, because some repairs can 
apply more than one way. Notice that there are two nodes labelled with R among the first six 
branches." These correspond to two ways to apply the Refocus repair, ff a repair is not applicable 
to a certain impasse or the repair fails to fix the impasse when it is applied* then the corresponding 
node has "F," for filtered, as a prefix. 

Some of the repairs lead to further impasses. When this happens, the node has a subtree to 
its right (e.g.. the Backup repair to this impasse). On the other hand, if the test can be completed 
without further local problem solving, the node is a leaf of the tree and it has a number as its 
prefix, The number is an index into the table beneath the tree. For instance, solved test I has 
exactly the answers produced by the bug Borrow- Won' t-Recurse. Solved test 2 generates exactly the 
answers produced by a set of three bugs. (Actually, the "bugs" with exclamation points in their 
names are called "coercions," See appendix 2.) ^ 

. 

Sierra's solver has a switch that controls whether it will generate bug migrations or not The 
. above impasse-repair tree was generated by turning off bug migration. This causes the solver to use 
patches so that whenever an impasse occurs that has occurred before on the test* the solver will 
apply the same repair that it ehose before. If bug migration were left on^ then the solver would 
generate a huge number of solved tests. Essentially, each occurrence of the original impasse (i.e.» 
the borrow-firom^zero impasse in this case) would yield an impasse-repair tree. There are seven 
borrow-from-zero columns on the diagnostic test. Hence, Sierra would generate approximately 20^ 
solved tests if bug Jnigration were left on. Most of these woufd be identical, probably, but still 
Sierra would have to generate them all if observational adequacy were to be thoroughly assessed. 
Needless to say. this is not what is done. For practical reasons, observational adequacy Is assessed 
Only with respect to bugs, not bug migrations. 
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3c-Zboi1 



fZO Pcv: Subl Zoro7 Q 



/Pcv: Subl Zero? t 



f13 Pcv; Subl Zero? F 
Pcv; Subl Zero? K 




^Pcv: Subl Z9ro7 RC 



^Pcvt Subl Zdro7 R 



19 Pcv: Sub LessThan? Q 



Pcv: Sub LessTban? 



15 Pcv: Stib LessThan? F 
14 Pcv: Sub LessThan? K 
F Pcv: Sub LdssThan? R 




18 Halt: 1/MULTI Q 
17 Halt: 1/MULtI F 
16 Halt: 1/HULTI K 
F Halt: 1/MULTI R 
F Halt! 1/MULTI 6 



rll Amb: I/TRAOE Q 



^Amb: 1/TRAOE B 





^Amb! 1/TRAOE F 



\3 Amb; t/TRAOE F 
^2 Amb: l/TRAOE M 
^F Amb: l/TRAOE R 

1 Crazy- notation Q 
F Crazy-notation R 
F Crazy-notation K 
F Crazy*notat1on F 
F Crazy-notation 8 





10 Pcv: Stib LdssThan? Q 



Pcv: Stib Les£Than7 



6 Pcv: Sub LdssThan? F 
5 Pcv: Sub LessThan? H 
F Pcv: Sub LdssThan? R 

4 Crazy^notatlon Q 
F Crazy-notatlor* R 
F Crazynotatlon K 
F Crazy- notation F 
F Crazy^ftOtatlOft B 



9 Halt: 
8 Halt; 
Halt: 
Halt: 
Halt; 



1/MULTI Q 
1/MULTI F 
1/MULTI K 
1/MULTI R 
1/MULTI B 



B 



1. 
2. 
3. 
4. 

5. 

7. 

8. 

9. 

10. 

11. 

12. 

13. 

14. 

15. 

16, 

17, 

16, 

19, 

20, 



{BorrDw*Won*t-necurse} 
{Borrow- Acf OSS -Zero 
{ Borrow- Across -Zero 
{ Borrow - Across -Zero 
{ Borrow - A cross- Zero 
{Borrow 'Ac ross- Z^ ro 
{Borrow-Acfoss-Zero 
{Borrow* Ac ross -Zero 
{Borrow - Across*Zero 
{ Borrow - Across -Ze ro 
{Borrow-Ac ross-Zero 



Uouchecf-Zero-Es'Ten rrouched-Ooubte-Zero-rs*QuiO 
Uotichecf -Oouble-Z^o-fs- Qu;t} 
ITotiched^lS'QuiO 

Uotiched^Ns Blank ITotiched-Dotible-Zero-ts^Quit} 
rrotichecf-0-N=N rrouched-Ooubre^Zero^ls-Qult} 
ITouched-O'NaBlank iTouched-Oouble-Zefo-ls-Quit) 
ITouched^- N = 0 ITouched-Doubfe-Z^ro-ls-OulO 
rrouched-o-is*QuiO 
rrouch6d^is*QuiO 
iTouched^ls-QuEt} 



{Stops-Borrcw^t-Zero} 

{Borrow-Add-Decrenwnt-rnsteadof'Zero} 

{Blank*lnsieadof-Borrow*From-ZerQ} ' 

{Smalfer*From-LarQer-ln3leadof-6orrow-Ffom-2ero} 

{B)ank-lnsteadot*Borrcw-From-Zdfo} 

{Top-lnsteadof-Borrow-Ffom-Zero} 

{Borrow'WonH-necuroe} 

{Bor row-Won*t -Hecurse} 

{Borrow-Won*t-neGursd} 

Figure 2-18 

(a) Impasse-repair tree for the core procedure 3c*2bor. 
(b) Debuggy's analysis of each of 'he 20 predicted test solutions. 



ERIC 



JxA. 



/ 

MODKL 



Observational adequacy 

In principle, the following simple procedure is used tu test ihe observational adequacy of the 
theory; 

1. Administer diagnostic tests to a large number of students. 

2. Collect the test streets and code the answers into machine-readable form. 

3. Analyze each test solution with Debuggy, thus redescribing it as a set of bugs, 

4. Call the set of all test solutions (represented as bug sets) OST — tne observed test solutions. 

5. Fonnalisy; the textbooks used by the students* producing several lesson sequences, 

6. Formalize an initial state of knowledge, KSq. 

7. Run Sierra's learner over each lesson sequence, 'rhis produces one core procedure tree per lesson 
sequence, 

8. Formalize a diagnostic test fonii, 

9. For each procedure in each core pn>cedure tree, run Sierra's solver over the diagnostic test This 
produces one impasse-repair u^e per core procedure. 

10. For each leaf of each impasse-rcpair tree (except the leaves re presenting filtered repairs), analyse 
the leafs solved test with Debuggy. This produces one bug set per solved test 

11. > Call the set of all such bugsetsPST — the predicted solved tei;ts. 

12. Calculate OSrnPST.OST-PSTandPST-OST, 

13. Separate the star bugs, if any, from PST- OST, 

In practice, things are more complex. Steps 5 and Convolve tailoring. Trying difFerent lesson 
sequences led to the discovery that omitting the regrouping lcsi>on eauses the model to generate 
several new, valid predictions. In principle, the initial knowledge ^^tate, K^, shouts be a rich source 
of variation since it is likely that not all students have the sanie initial understanding. In the 
, Southbay experiment, just one KSq was used. Two others were tried, briefly, but they produced 
almost the same observation adequacy as the ehosen KSq^ ^ 

The steps that involve Debuggy, steps 3 and 10, are aetually quite a bit more complex than 
described so far. In faet^ most of the rest of ihis section will be eoncerned with the practical aspects 
of using bug-based observational adequacy. First, the reality of step 3, analyzing the observed test 
solutions, will be briefly described (VanLchn, 1981, eovcrs it in detail). Then the reality of step 10» 
analyzing the test solutions generated by Sierra, win be describe^!. Finally, the Southbay numbers 
for OSTnPST. OST- PST and PST -OST will be presented. 



* The observed bug Borrow-From-Bottom'Insteadof*Zcro can be generated by modifying ihe 
grammar rule that defines COL. The observed bug 2;pro-Inst«;adof-Borrow can be generated by 
modifying the primitive Sub function so that it implements maK(0, x-y) instead of |x--yl^ These 
are the only observed bugs that are generated by non-standard KSq (that I know of) and not by 
the standard KSy.. 



ERIC 



613 



62 



Analyzing ike Southbay data 

Not all students have bugs. Some students know the correct algorithm. Others migrate 
among several bugs during the test On one experiment the students fell into five categories in 
roughly the following proportions: . _ 



7 


(10%) 


Perfect score. No errors of any kind. 


6t 


(46%) 


Knows the correct-algorithm; errors due to slips alone. 


34 


(26%) 


Has a bug or a set of bugs (plus perhaps some slips as well). 


14 


(10%) 


Intnt-test bug migration (plus perhaps some bugs and slips). 


11 


(8%) 


If rrors cannot be analyzed. 


134 


(100%) 


lotal 



These proportions vary with the gnidc level The above proportions are for third graders tested late 
in the yean fn genera^ the older the studeni populatioii, the greater the proportion of students in 
the slips category and the smaller the proportion in the bugs and bug-migration categories, fn the 
early third grade, for example, stud*3nts in the bugs categories constitute over 50% of the sample 
instead of 26%. ITiis shift is not surprising. In the early grades* the students have not yet been 
taught the whole algorithm* When given the diagnostic test they will have to do local problem 
solving to answer most of its items- Hence, younger students are more likely to have bugs^ and 
older students are more likely to have only slips. 

The figures just given eome from a special experiment where studeiUs were given tWo tests on 
consecutive days with no intervening instruction (this experiment is called the Short-term study in 
VanLehni 1981). For each problem on one test, there was a very similar problem on the other test 
These matched-item tests were designed to provided enough redundancy that cases of bug migration 
could be found, 'tl^e usual twentyitem test is too short for one to have confidence in bug 
migration analyses- 

The two-test e^spcriment allowed assessment of the short-term stability of students' errors. 
Various kinds of errors are expected to have differing kinds of short-term stability. Slips are 
expected to vary widely over two tCi?ts given a short time ^part. lliere may be no slips on one test 
and Several on another If there are slips on both tests, they are not expected to occur on the same 
problems. Impasses on the other hund are expected to remain in evidence across tests. Because 
impasses derive firotri the student's cone procedure, and it is assumed that core procedures are stable 
in the short term^ impasses should be stable ip the short term as well. An impasse may show up 
differently on the two tests, it might manifest as a bug on one test and as a different bug or as 
intra-test bug migration on the next test- What would be unexplained is a impasse that is present 
On One test but absent on the other* These considerations prompt the following tabular summary of 
the percentage of students e^dtlbiting the various kinds of stabJIity: 



3 


(4?^) 


No errors on cither test- 


32 


(49%) 


Stable correct procedu re; changes due to slips alone. 


3 


(4%) 


Stable bugs; changes due to slips alone. 


12 


(18%) 


Stable impasses; changes due 'x> repairs (often along with slips and stable bugs) 


13 


(19%) 


Appearing and disappearing impasses (witit slips and stable bugs). 


_4 


(6%) 


Errors cannot be analyzed. 


67 


(100%) 


total 



The stability patterns of the students in the first four categories (75%) conform to expectations, 
while the behavior of the students in the remaining two categories (25%) remain unexplained. 
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These siabilKy data show th^t^c older view of errors as due to cither bugs (deterministic, 
fcpcaiable errors) or slips (underdeLennined, stochastic errors) is incomplete. On that view^ bugs 
were stable and only slips could aceount for shorL-tcnn instabilities. "ITie irtipasscs/rcpair notions 
coniribule substantially to our ability to understand s hort term instabilities (in addition to thei r role 
^s-an-expiaimtiair of aiJtjUlSiliort ot bugs). 

However, a significant proportion of the tests (8% of the statie, one-test data, and 25% of the 
stability data) eannot yet be analyzed even with these advanees. Most of thcSe students arc in the 
unanaly/^ble eaiegory because the tests were simply not long enough to give the analysts a lai^e 
enough sample of their behavior. Without a large and variegated set of errors^ it is sometimes 
impossible to disambiguate the various possible explanations for the student's errors. Sueh 
ambiguous analyses are countedfin the category unanalysable, fn other eases, species of behavior 
that have not yet been form^/.ed were apparent. Some students appeared to "punt" the test by 
struggling through .the first part of it, then giving up and using some easily executed buggy 
procedure. There seemed to Jbe several eases of cheating by looking at a neighbor's paper, in short, 
there will undoubtedly be some errors that have rather uninteresting causes and hence can properly 
be left unanalyzed- My belief is that we have not quite reached that level of urderstanding yet 
There probably remain some undiscovered, interesting mechanisms that may ftjrther our 
understanding of errors as much as the impasse/repair process did. 

The figures given above were derived by hand analysis of the matched-test data. This is 
necessary because Debuggy eannot analy;^ bug migrations, ft can only find bugs that are stable 
across the whole test fts analyses of the same data, and the ^^outhbay data, arc shown below; 





South bay 


Short-icrm 


No errors 


96 (10%) 


'14 (10%) 


Errors due tc^slips alone 


198 (20%) 


41 (31%) 


Has a set of bugs 


340 (34%) 


35 (26%) 


Errors eannot be analyzed 


377 (36%) 


44 (33%1 


total 


1013 (100%) 


134 (100%) 



Notice that about a third of the students could not be analyzed by Debuggy- The main reason that 
so many students could not be diagnosed by Debuggy is that they were making too many slips 
(VanLehn. 'discusses this issue in detail). However, there was also some bug migration among 
the unanalysable students, as well as a non-trivial amount of truly puzzling behavioi. For Debuggy 
to do better, it would have to have more redundant test items. Then it could locate slips in the way 
that the human analysts did, by comparing a student's performance on identical or nearly identical 
problems. On the other hand, Debuggy is totally objective. -Unlike me, it does not "hope** for the 
occurrence oV certain bugs that would confirm the predictions of tl\c theory. In service of 
objectivity, its orinions are used throughout this document as the definition of "bug occurrence." 
Also. Subsequent references to the Southbay data will inilude the ShOrt-term data as well- 

Debuggy cannot invent new bugs. Its inventiveness is limited to creating new sets of bugs. 
Debuggy has a database of bugs that it combines, to form analyses. (Creating a new set of bugs 
may sound trivial, but it is actually quite difficult since many bugs interact with each other in 
complex ways- See Burton, 1982.) The method used to discover ,icw bugs for the database is to 
use Debuggy as a filter to remove students whose behavior is adequately characterized by existing 
bugs. This leaves the human analysts to concentrate on discovering ^ny systematicity that lurks in 
what Debuggy cojnsiders unsystematic behavior. When even the barest hint of a new bug is 
uncovered by the experts, it is formalized and incorporated in Dcbuggy's 'latabasc- That way, 
Debuggy will discover any subsequent occurrences of the bug, even when it occurs with other bugs, 
and even when it interacts in non-lincar> complex ways with those bugs* At tlie end of the 
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Sonthbay experiment the database had grown to 103 bugs. The tests had been thoroughly 
examined by myself and iwo other analysts* Wo were confident that few bugs, if any, lurked 
undiscovered in the data. As will be seen shortly, our eonHdenee wus misplaeed. Sierra invented 
some bugs that we did not think of, and six of them turned out to be observed bugs! 

Ocrwratmg and analyzing, the predicted test solutions 

To generate the Southbay prcdietrons, Sierra^s learner traversed lesson sequences H, SF and 
HB. producing three cor^ procedure trees, in principle, all core procedures in all three core 
procedure trees would be submitted individually to Sterra^s solver, along with the appropriate 
deletion-generated core procedures. ITiis would be 63 core procedures, for the trees Just given. 
However many of the core procedures are quite similar. Others are known to be "star*' core 
procedures in that all the solved tests that they Mill generate will be marred by star bugs (e.g.« the 
PiOO branch of H*s tree is a "star*' branch. As discussed iti section 2.8, it would be blocked by 
adding a 'Torcach'* loop to the knowledge representation language)* Running these redundant 
and/or star core pEocedures through -the solver vvould only generate more instances of already 
generable bugs, not bugs that could not be^geijpratcd some other way. Consequently, a subset of 
the 63 core procedures was selected and nin through the solver*. Thirty core procedures were 
submitted to the Sierra*s solver, generating 30 impasse-repair trees. The trees* leaves yielded 893 
solved tests. Ilie solved tests were analyzed by Dcbuggy, 

The analysis of predicted test solutions has to be more stringent than analysis of observed test 
solutions. Basically, a predicted test solution counts as analyzed only if Debuggy's bug set for it 
exactly matches its answers. Inexact analyses doesn^t make sense, since Sierra does not make slips 
nor did it do bug-migration (bug migration is turned off in the solver when generating the impasse* 
repair trees). However^ exact matching turned out to be too stringent a criterion. In Debuggy^s 
versions of certain bugs, there is special code inserted to handle rare eases, such as the bug running 
off the left edge of the problem while borrowing. In Sierra, such eases are handled automatically 
by the usual local problem solving mechanism. However, the special case code in Debuggy's bugs 
occasionally would not correspond to any of the various impasse-repair "combinations that Siena 
generated. The net effect is that none of Sierra's solved tests would exactly match the Debuggy's 
bug*s performance* In almost all eases, the analysis was off by one problem. That is> Debuggy's 
ranalysis would match 19 out of 20 answers on a solved test, but the 20th problem^s answer would 
not match exactly (although the rightmost few digits would often be the same). Consequently, the 
analyses were divided into two classes: perfect and almost perfect. The latter class is the ofl^by-one 
analyses, 

\ 



^ The subset included al! the core procedures from the H core procedure tree, except the PIOO 
bmnch and the Blk branch. From the SF core procedure tree> only the procedures that ^rc in a 
dirfcct line from the root to the **ok" procedure were run; From the HB core procedure tree, only 
threfe procedures were run: 3c-lbor, 3c-lbfe and 3c*lbfid. All the deletion-generated procedures 
ftom'^H and HB were run, except for those that arc generated before three-column subtraction is 
taught> Since a Foreach loop would change the early procedures* structures, the procedures that 
wOfjId be generated ftom them by deletion would be different as well ^ 
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Good Proc's 



No errors 


2 


(1%) 


Perfect 


209 


(83°/i) 


Almost perfect 


28 


(11%) 


Aiialyzable 


4 


(2%) 


j^nanalyzablc 


8 




total 


251 


(lOG/i) 



The 893 solved tests generated by Sierra, cai 



No Loop Proc's 


All Proc's 


0 


(0%) 


2 


(0%) 


289 


(46%) 


498 


(56%) 


91 


(14%) 


119 


(13%) 


145 


(23%) 


149 


(17%) 


111 


( 17%) 


125 


(14%) 


642 


(.100%) 


893 


(100%) 



2-19 

•ized by how well Debuggy s jnalysis matched. 



M first, Dcbuggy^s database of bugs was insurfident to analyze very many of the solved tests. 
Only 1% of the tests could be perfectly or almost perfectly analyzed. To solve this problem, 
Dcbuggy's database was expanded fmm 103 bugs to 147 bugs. The 44 added bugs included star 
bugs as welt as bugs that coutd plausibly be observed bugs. Any bug was added that would get 
more of the solved tests analyzed. However a point of diminishing returns was reached. Figiirc 
2-19 shows the number of solved tests at the point where 1 stopped add^ag bugs to the database. 
The solved tests are separated into two groups corresponding to two groups of core procedures. 
The "Good*' group (the left column of the figure) contains solved tests from core procedures that 
*'know" how to loop across columns. These would presumably be roughly the same when the 
"Porcach*^ loop problem is fixed, Tlie otlier group (middle column) comes from core procedure 
that Suffer the effects of not being able to process multi-column problems, I tended to add fisw 
bugs to the data base in the service of thdr analysis since * expect that they will not "be with us 
much lonfew. The figures reflect this. Enough bugs were added to the data base to analyze 95% of 
the "Good*' solved tests, but only 60% of the solved tested were analyzable from tlie other set of 
solved tests. 

After the 44 bugs were added to Debuggy's database to Sierra, the Southbay data was 
reanalyzed. Six of the 45 bugs turned up in the analyses*^ Two of them even' occurred rather 
fipequently — seven times each. This was quite a surprise. 

Results 

When Debuggy analyzed the 1147 observed test solutions that constitute the Southbay data it 
f^^und bug sets fbr 375 of theiri. However, many of the solved tests received the same bug set. The 
eleven most common analyses are listed in figure 2-20. One can see that the frequency of 
occurrence falls off rapidly. There are 134 distinct bug sets. Most of them (99) occur only once. 
Appendix 2 lists all the observed bug sets. Debuggy found bug sets for 617 of the predicted test 
solutions (including almost perfect as well as perfect matches). There were 119 distinct bug sets. 
Appendix 4 li^ts them. With 134 bug sets in OST aad 119 bug sets in P^*" the observational 
adequacy can (at lastt) be calculated: 



* The bugs are: Borrow-Across^Second-Zero (7 occurrences). DoesnVBorroW-Except-Last (I 
occurrence), Only-Do'Units (1 occurrence), Smaller-From-Larger-Except-Last (3 occurrences), 
Smaller-From-LaiTger-Instcadof-Borrow-Unless-Bottom Smaller (7 occurrences), and Top'Instead-Of- 
Borrow'From-Zcro (I occurrence). 
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OSTnPST 
OST-PST 
PST-'OST 



11 bugsc[s 
113 bug sets 

118 bug sets, of which 53 contain star bugs. 



Appendices 5 and 6 list the bug sets in each of OSTHPSi; OS'r-PST. and PST^OST. 
Interestingly, the bug sets in OS'^HPST happen to include several of the most common observed 
biij! sets. Of the elcyen most common bug sets (see figure 2-20), h includes bug sets 2, 3> 4, 5 and 
8, 

Because a bug-based measure of observational adequacy is being used, rather than one based 
on rav/ test solutions* the figures above can be dissected to discover why the observation adequacy is 
the way ii is. For instance, why is the most common bug seti { Smaller- From-l^i^ger}* not in 
OSTHPSTf? To answer such questions, each bug set in OST-PST is intersected vvith the bug sets 
in PST-OST. ITiis fomis four new categories* depending on whether the intersection is empty or 
not: 

non-empty empty 



OST-PST 
PST-OST 



76 


47 


6S 


40 



This ehart shows that 76 bi^g sets from OST-PST had at least one bug generated by the model 
These 76 bug sets include the popular bug set {Smaller-From-Larger} because it overlapped with 
' the predicted bug set {SmalIer*From-LaiTger *Only-I>o-First&Last*Coluihns}* The second bug in 
the predicted bug set is a star bug. It is generated because there is no Foreach loop in the 
representation language. When a poreaeh loop is added, {Smaller-From-Larger} will be properly 
predieted Using overlaps, one can find out what needs to be done to improve observational 
adequacy. The overlaps are listed in appendices 5 and 6, 

From the overlaps, one can see that tttc main reason that OSTHPST is so small is that many 
students have bu^ in addition to the bugs generated by the model For instance* there is a set of 
bugs thst the model does not generate that all involve mis-anSwering columns whose top digit is a 
zero. The bugs tn this class are (appendix 1 contains deseriptions' of these bugs): 





occurs 


bug 


1. 


103 


( S^lalle^F^o^l•Latger) 


z 


34 


(Sfops-Borrow-At-Zero) 


3. 


13 


( Borrow- Across-Zero) 


4. 


It) 


( BorrowFrom-Zero) 


5. 


10 


( Borrow No-De:;rcinciit) 


6. 


7 


(Siops-Borrow-AfZcro Diff-0-N=N) 


7. 


6 


( Always- Borrow-Left) 


8. 


6 


(Borrow-Across-Zcro IToiicIied-O-fs-Ten) 


9. 


6 


( Borrow- Hcross-Zero Diff-0-N=N) 


10. 


6 


( Borrow-Acrc6S-Zero-Ove^Zero Borrow-Across-Zero*Over Blank) 


IL 


6 


( Stops- Borrow- At-Zcro Borrow -Once-Tlicn-SmalJer-Froin-Largcr DifF-O-N =N) 



Figure 2-20 

The eleven most common bug sets, with the number of times each ocxurred 
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- Difr-0-N=0 
Difr-0-N = N 
O-H^N-Aflcr-Borrow 
0-N=0-Aftcp Borrow 
0-N=N-b;xccpi-Aftcr-Bonow 
0 - N = O-Kxccpt-Aftcr- :;torrow 

Suppose the model were able U) generate these bugs in such a way that they could occur in ,.ny bug 
SCI ih the model currently generates. ITiis is not so implausible; a story for how that might have 
happc led will be presented, ff any of the above 0-N bugs could ocx:ur in the bug sets of pjiT^ 
then OSTfiPST would triple in si/.e, becoming 43 bug sets large. The point is that small increases 
in tlie productivity of the model with respect to primitive bugs ean translate into big gains when 
counting bug sets. This effect can be seen clearly with the aid of a toy example, S* /^pose that 
there were only ten primitive observed bugs« and that the model generates just two of them. 
Suppose further that that all 45 pairs of these ten bugs occur as bug sets. Only one of ihe observed 
bug sets will be in OSTPlPST. ff the model generated three or four bugs instead of two, the 
figures would change, but not rapidly: 



2 bugs 3 bugs 4 bugs 

■i 3 6 osrnpsT 

16 21 24 OST- PST with non-empty intersections 

28 21 15 OST- PST with empty in tcFsections 



ff the model is generating less than half of the observed primitive bugs, then counting bug sets 
makes it ' much worse* (Conversely, if it generates more than half the primitive bugs, counting 
bug sets maices it look much better) This suggests measuring observational adequacy with respect 
to primitive bugs, rather than bug sets. 

Let OB be the union over all the bug sets in OST. OB is a set of 76 bugs. Let PB be the 
union over PST. PB contains 49 bugs. Then: 



OBftPB 25 bugs 

OB-PB 51 bugs 

PB- OB 24 bugs, of which 7 are star bugis 



Appendix 3 lists these bugs. Figure 2-21 displays the figures as a Venn diagram. Essentially, these 
figures say that half of the theory^s predictions are confirmed* On die other hand> there is much 
work left to do> because two-thirds of the observed bugs are not yet accounted for* 



6S 



Model 



otxerved ^ predicted star 




Figure 2-21 

Venn diagram showing relationships and sizes of the sets of predicted, observed andstar bugs. 



Ill A comparison with other gcnciative theories ofbugs 

The numbers presented here are difficuU to understand without some point of reference* Two 
such points ar^ provided by earlier getierative theories of subtraction bugs. An early version of 
repair theory is documented in Brown and VanLehn (1980). Its empirical adequacy can be 
compared with the present theory's. Clearly, this theory will do better since it includes the ideas of 
its predecessor. Another generative theory of subtraction bugs was developed by Richard Young 
and Tim O'Shea (1981). They const^cted a production system for subtraction such that deleting 
certain of iis rules (or adding certain other rules, in some cases) would generate observed bugs. 
They showed that these muiations of the production system could generate many of the bugs 
described in the original Buggy report (Brown & Burton, 1978). It is important to note that many 
of the 76 currently known subtraction bugs were not yet observed back then. One can assume that 
their model would generate more bugs than the ones reported in (Young & 0*Shea, 1981)- Section 
10.3 discusses their approadi in some detail. 

A chart comparing the results of the three theories is presented as figure 2-22. Observed bugs 
ih^' no theory generates are not listed, nor are bugs that have not been observed. (N.B., the figures 
in Brown & VanLehn (1980) count bugs differendy than they, way they are counted here. That 
report counts combinations of bugs with coercions as distinct bugs - see the note on coercions in 
ar^^cjidix 2.) The chart diows that the present theory generates more bugs, which is not surprising 
sine: it embeds many of the earlier theories' ideas. What is perhaps a litde surprising is that there 
are a few bugs that they generate and it does noL These bugs deserve a closer loolc. 
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Y&O B&V Cur. Occurs 



Bug 



10 



12 



V 



25 



6 

41 

4 

8 
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Always- Borrow -Left 
Blank- Instcad-of- Borrow 
Borrow -Across- Second-Zero 
Borrow- Across- /-cro 

Borrow-DonVDccrcmcnt-Uniess-Bottom-Smallcr 

Borrow-From-Onc-Is-Ninc 

torrow-From-Onc-ls-Tcn 

Borrow -From-Zcro 

Borrow -From- All-'^xro 

Borrow -From-Zcro- Is-Ten 

Borrow -No- Decrement 

Borrow -N 0- Decrement- Except-l-ast 

Borrow -Treat-One- A s-Zero 

Can't-Subtract 

Doesn't- Borrow -Except- Last 

DifrO"N=0 

Diff-0--N=N 

Diff-N-N = N 

Diff-N-0=0 

Don V Decrement-Zero 

Forget- Bonow-Over-Blanks 

N - N-Causes-Borrow 

Only-Do-Unit$ 

Quit-When- Bottom - Blank 

Stutter Subtract 

Smaller-From-Larger 

SmalIer*From-LargerExcept*Last 

Smaller-From-Largerlnstead-of-Borrow-From-Zero 

Smaller-From-Larger- Insteadof-Bonow-Unless-Bottom -Smaller 

Stops-Borrow- A t-MuItiple-Zero 

Stops-Borrow-At-Zero 

Top-lnstead-of-Borrow-From-ZEro 

Zero- f nsteadof-Borrow 

totals 



Figure 2-22 

Comparison of observed bugs generated by three theories: 
Y&O = Young and O'Shea; B&V = Brown &VanLehn; Cur= current theory 



Early repair theory generates a bug called Stutter-Subtract that the present theory does not 
generate: 

Stutler^Subtiact; 346 346 897 

2 " 2 2 2 z 67 

123X 123V 230X 

This bug does not Know how to handle one-digit columns* It impasses when it tries to do such a 
column. Early repair. theory used a repair called Refbcus Right to fix the impasse* ft would cause 
the column difference operations to use the nearest digit in the bottom row instead of, the blank. 
Thus, the second column in the first problem is answered with 4-2* 
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The present theory has a non -directional Refocus repair. U finds the fetch pattern responsible 
for the current focus of attention and rematches the pattern. It finds tlie closest match that will get 
tlie procedure past the impasse. In this case, there is no such match due to the way the grammar 
structures the subtraction problem. To obtain the equivalent of Refbcus Riglu would require a 
grammar that Views the problem as three multidigit rows instead of a list of three-place columns. 
Such a grammar would probably generate Stutter-Subtract but might generate some star bugs as 
welt. 

The point behind the Stutter-Subtract story is that early repair theory had some notational 
knowledge embedded in its repairs. It had several Refocs repairs, and the> were specialized for the 
gridHike notation of subtraction. In the present theory, all knowledge about notation is embedded 
in the grammar. The Refocus repair is general. It doesnU know about any particular notation. In 
the early theory* it was stated that the repairs were specializations of weak, general -purpose methods. 
They were tailored for subtraction. In the present theory, the repairs actually are general-purpose 
methods, not specializations. 

Young and 0*Shea*s model generates a class of bugs thai they call "pattern errors." At that 
time, four bugs were included in this class: 

Difr-0-N = N If the top of a column is 0* write the bottom as the answer. 

Difr-0"'N = 0 If the topof acolumn is 0> write zero as the answer. 

Difr-N-0=:0 Ifthe bottom ofa column is 0, write the zero as the answer. 

Difr-N-N = N Ifthe top and bottom areequal write one ofthem as the answer. 

Young'and 0*Shea derive all four bugs the satne way. Each bug is represented by a production 
rule, and the rule is simply added to the production system that models the student^s behavior. Put 
differently, they derive the bvigs formally by stipulating them, then explain the stipulation 
informally. Their explanations are: 

The zero-pattern errors are also easily accounted for> since particular pattermsensitive rules fit 
naturally into the framework of the existing production system. For example^ from his earlier 
work on addition, the child may well have learned two rules sensitive to zero, NZN and ZNN 
[two rules that mean N±0=N and 0±N=:N]. Included in a production system for 
subtraction, the first, NZN, will do no harm, but rule ZNN will give rise to errors of the 
*0-N = N" type. Similar rules would account for the other zero-pattern errors. If the child 
remembers from addition just that zero is a special case, and that if a zero is present then one 
copies down as the answer one of the numbers given* then he may well have rules such as 
NZZ or ZNZ [the rules for the bugs DirF-N-0 = 0 and DirF-0-N = 0]..., Rule NNN [the rule 
for the bug DifF-N- N= NJ covers the cases where a child asked for the difference between a 
digit and itself writes down that same digit It is clearly another instance of a "pattern" rule- 
(Young & O'Shea, 1981, pg. 163), 

The informal explanations, especially the one for Difr*0-N = N, are plausible. To treat them fiiHy* 
one would have to explain why only the zero rules are transferred from additions, and not the other 
addition rules. 

The point is that one can have as much empirical adequacy as one wishes if the theory is not 
required to explain its stipulations in a rigorous, fonnal manner. The present theory could generate 
the same pattern bugs as Young and 0*Shea*s model simply by adding the appropriate rules to the 
AO0S and reiterating their informal derivation (or tell any other story that seems right intuitively). 
This would not be an explanation of the bugs, but only a restatement of the data embroidered by 
interesting speculation. This approach does not yield a theory with explanatory value. In shorty 
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there is a tradeoff between empirieal adequacy and explanatory adequacy. If the model is too oasily 
tailored, then "it is the theorist and not the theory that is doing the explaining. The theory per se 
has little explanatory value. So tailorability, and explanatory adequacy \n general, arc key issues in 
evaluating the adequacy of the theory. 

Hillclimbing 

It would be foolish to elaim that the present theory is wonderful beca\ise half of its 
predictions about the Southbay experiment were eonfirmed. It would be equally foolish to assert 
that the theory is in desperate need of improvement because it models only two-thirds of the bugs 
in the Southbay sample. The observational adequacy figures are meaningless in isolation. As an 
absolute measure of theoretical quality, observational adequacy is nearly useless. However, it is 
excellent as a reldlive measure of theoretical quality. One takes two theories and compares their 
'observational adequacy over the same data, talcing care to study their tailorabiliiy as well. 

Observational adequacy is particularly useful in comparing a new version of the theory to an 
older version. This allows one to determine whether the new \ersion improves the empirical quality 
or hurts it Indeed, this is hdw the present theory arrived at its current form. To put it in the 
language of heuristic search (which some claim is a good metaphor for scientific discovery), 
observation adequacy has been used to hilklimb: to find a maximum in the space of possible 
theories. ITic claim, therefore, is not that the theory's eunent degree of observational adequacy is 
good or bad in an absolute sense, but rather that it is the best that any theory ean do, given the 
same data and the same objectives. 

There is a well known problem with hilfclimbing. One can get trapped at a local maximum 
that is not a global maximum. A common solution to this problem is to begin with a gross 
representation of the landscape so that the search can find the general lay of the land and thereby 
determine approximately where the global maxima will be. This done, hillclimbiiig can be done at 
the original level of detail, but remaining in the'limited area where any local maxima are likely to 
be global maxima as well. The same strategy has been used in this research (or at least, one can 
reconstruct the actual research history this way). There arc three levels of hypotheses (which are 
also the three levels of oiiganization of the following chapters). The most general levels the 
architecture level, is a gross representation of the cognitive landscape. It addresses general issues, 
such as whether learning is basically inductive or not. Hillclimbing in the architectural level yields 
several hypotheses that define the theory in a non-detailed way. The next lower level of detail is 
the representation level. It searches tfirough a thicket of knowledge representations issues, c^.. 
whether procedures should be hierarchical or not The third level, the bias level, is the last stage of 
hillclimbing. It finds hypotheses about inductive biases that will optimize the fit between the 
model's predictions and the data. Because the arguments for hypotheses are structured into gross, 
medium and Hne levels of detail, one can be somewhat assured that the hillclimbing implicit in this 
su^tegy has brought the theory to a global maximum. 

It bears rciteratitig that cmpirxal quality is not the only measure of theoretical validity. It 
must be balanced against explanatory adequacy — does the theory really explain the phenomena or 
does it just recapitulate them, perhaps because they have been tailored into the model's parameter 
settings? This theory is quite strong in the explanatory department The model takes only three 
inputs, and these inputs are such that the theorist has little ability 'lO tailor the predictions to the 
data. This implies that the predictions arc determined by the structure of the model, which is in 
turn determined by the hypotheses of the theory. So, the competitive argumentation that fills the 
remaining chapters can be eonstnied as a hillclimbing adventure where the measure of pjoagress is a 
combination of Inercasiog observational adequacy and decreasing tailorability. 
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Chapter 3 
Getting Started 

llic methodological goal of this research is to provide eompetitivc arguments for supporting 
each hypothesis. But when every hypothesis needs a motivation, and a motivation needs other 
hypotheses, then getting started is difHeult Some hypotheses must be given support that docs not 
depend on any other hypotheses, lliese initial hypotheses arc often ealled, somewhat unfairly, 
assumptions. 'Fhis ehapter states the theory's assumptions, whieh are two: students leam induetlvely, 
and wHat they leam are procedures. The ehapter trys to make these iwo hypotheses appear 
plausible in various ways. I^ter chapters will be able to use these itfitial hypotheses as the 
foundations for competitive ai^umentation: here, there is no such foundation, so hands must be 
waved 



3.1 Teleology or program? ^ 

The first assumption ts that student's knowledge about procedures is schematic but not 
tclcological or prototypical To define these terms, '^schematic/* "teleological>and "prototypicaV* 
several other terms must be iniroduced. (Figure 3-1 is a road map for the* terms that will be 
introduced.) Computer programmers generally describe a procedure in three ways; 

► Program: A program is a schematic description of actions. It must be instantiated* by giving 
it inputs, before it becomes a complete description of a chronological sequence of actions. 

► ' Action sequences: One can describe a particular instance of a program as a chronological 

sequence of actions (or as a sequence of problem states. For the present discussion well use 
action sequences, reserving problem state sequences to serve the same purpose later), that is, 
accuting a program produces an action sequence. - In principle, one could describe a 
procedure (N.R, the term ^^proccdure** is being used temporarily to mean some very abstract, 
neutral idea about systematic actions) as a possibly infinite set of action sequences. This is 
analogous to specifying a mathematical lijnetion as a set of tuples (e.g.* nt as {<0,1>, <1,1>, 
<2,2>, <3,6>, <4*24>, ...}). AcUpn sequences are not usually used this way. Hieir most 
common use in prpgv'ammiog practice is in reporting times ^yhere a program did something 
unexpected (i.e., bug reports). 

► Specifications: Specifications say what a program ought to do. Often they are informally 
presented in documents that circulate among the programmers and market researchers on a 
product development team. Sometimes specifications are written in a formal lanugu^e so 
that One can prove that a certain program meets them. 

There are names for the processes of transforming information about the p^^durc from one level 
to another. Programming is the transformation of a specification into a program. ExecutiofK 
interpretation and running are names for the transformation of a program into an action sequence. 
There are also names for static, structural representations of these transformations* A trace is a 
structural rcprcsentation of the rclatlonship between a program and a particular execution of it A 
procedural net (Sacerdotl, 1977), a derivation (Carbonell, 1983b) and a planning net (VanLehn & 
Brown* 1980) arc all formal rcpresentations of the rclation^ip between a specification and a 
pnjgram. Actually, these three terms arc just a few of the formalisms being used in a rapidly 
evolving arca of investigation. Rich (1981) has concentrated almost exclusively on developing 
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Specifications 
Teleology i Programming 



Program 
Trace Executing 



Action Sequence 
Figure >-l 

Three levels of description fora '^procedure/' Names forthcproccssesof converting from higher levels 
to lower levels are one the right. Names for conversion structures are on the left 

a formalism describing the relationship between a specification and a program. In his 
representation systems, both th^ specification and the program are plans — the surface plan 
(program) is just a structural refinement of the other. Rather than seeming to commit to one or 
another of these various formalisms* the neutral term "teleology** will be used. Thus, the teleology 
of a certain program is information relating the program and its parts to their intended purposes 
(i.e.> to the specification^ 

Since ^Heleology" is a new term* it is worth a moment to sketch its meaning. The teleology of 
a procedure relates the surface structure (program) of the procedure to its goals and other design 
considerations. The teleology might include, for instance, a goal-subgoal hierarchy. ■ It mi^t 
indicate which goals serve multiple purposes^ and what those purposes are. It might indicate which 
goals are crucially ordered and which goals can be executed in parallel If the program has 
iterations or recursions, it indicates the relationship between the goals of the iteration body (or 
recursion step) and 'the goal of the iteration (recursion) as a whole. In general, the procedure's 
teleology explicates the desfgn behind the procedure. 

It is an empirical question which level of description — action sequences, traces, program, 
teleology or specification — most closely corresponds to the knowledge that student's acquire. It is 
possible that students could simply memorize the examples that they have seen. In this case, their 
knowledge of the procedure would be appropriaely represented by a set of action sequences. One 
might call them ^'prototypes" for the procedure (c>f. theories of na'ural kind terms based on 
prototypes, e.g.* Roach & Mervis, 1975). However, it is a fact that student's knowledge of 
mathematical procedures is productive, in the sense that they can solve problems that they have 
never seen solved. As discussed in section L2, accounting for this productivity is problematic when 
knowledge is represented as sets of action sequences. It will be assumed that student's knowledge is 
not prototypical. In fact, the only two levels of description that seem at all plausible are programs 
and teleologies^ Finding empirical differences between them is subtle, but not impossible. 

Considen for instance* a procedure fbr making gravy* A novice cook often knows only the 
surface structure (program) of the gravy recipe — which ingredients to add in which order. The 
expert cook will realize that the order is crucial in some cases, but arbitrary in others. The expert 
also knows the purposes of various parts of the recipe. For instance, the expert understands a 
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certain sequence of steps as making a flourbased thickener. Knowing the goal, the expert can 
substitute a comstarth-based thickener for the flour-based one. More generally, knowing the 
teleology of a procedure allows its tfscr io adapt the procedure to special circumstances (e.g., 
running out of flour)^ It also allows the user to debug the procedure For instance; if the gravy 
comes out lumpy, the expert cook can infcr that something went wrong with the thickener. 
Knowing which steps of the recipe make the thickener the cook can discover that t\\o bug jS that 
the flourfat mixture {the roux) wasnH cooked long enough. The purpose of cooking the rotix is to 
emulsifV the flour. Since the sauce was lumpy, this purpose wasn*t achieved. By knowing the 
purposes of the parts of the procedure, people are able to debug, extend, optimiic, and adapt their 
procedures. These added capabilities, beyond merely blowing (executing) a procedure, can be 
used to test for a teleological understanding. 

Do students acquire the releology of mathematical procedures? 

Gelman and her colleagues (Gelman & Gallistell 197S; Grecno, Riley & Gelman, 
forthcoming) used tests based on debugging and extending procedures in order to determine 
whether children possess the teleolojsy for counting (young children donX older children do). 
Adapting their techniques, 1 tested five adults fbi possession of teleology for addition and 
subtraction. Ml subjects were competent at arithmetic. None were computer programmers. The 
subjects were given nine tasks. Each task added some extra eonstmint to the ordinary procedure, 
thereby forcing the subject to redesign part of the procedure in order to bring it back into 
conformance with its goals. A simple task, for example, was adding left to right A more complex 
task was inventing the equal additions method of borrowing (i.e., the borrow of 53-26 is performed 
by adding one to the 2 rather than decrementing the 5). The results were equivocal. One subjea 
was unable to do any of the tasks. The rest were able to do some but not all of the tasks. The 
experiment served only to eliminate the extremes: Adults don*t seem to possess a complete, easily 
used teleology, but neither arc they totally incapable of constructing it (or perhaps recalling it). 
Further experiments of this kind may provide more definitive results. In particular^ it ivould be 
interesting to find out if adults were constructing the teleology of the procedure, or whether they 
already knew it At any rate, it's clear that not all adults possess operative teloobgy for their 
arithmetic procedures, and moreover, some adults seem to possess only surface structures (programs) 
for accomplishing a task. 

Adults found the teleology test so diflicuU that I was unwilling to subject young children to it 
However, there is some indirect evidence that students acquire very little teleology. It concerns the 
way students react to impasses (i.e., getting stuck while executing a procedure). Consider the 
decrement-zero impasse discussed in section 2.9. A hypothetical student hasn't yet learned how to 
borrow from zero although borrowing from non^zero numbers is quite familiar. Given the problem 



the student starts to borrow, attempts to decrement the zero, and reaches an impasse. If the student 
understands the teleology of borrowing, then the student understands that borrowing firom the 
hundreds would be an appropriate way to fix the impasse. The purpose of borrowing is to increase 
a certain place's value while maintaining the value of the whole numben Here, the tens place needs 
to be increased so that it can be decremented. Borrowing will serve this purpose. In short, the 
teleology of non^zcro borrowing allows it to be easily extended to cover borrowing from zero. 
Although some students may react to the dccrement*zero impasse this way, rnmiy do not. They use 
local problem solving instead, as discussed in section 2.9. Because students do not make 
teleologically appropriate responses to impasses, it appears that they did not acquire much teleology 
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(or if thcy da they arc unwilling to use it — in which ca^c ii*s a moot point whether they have it or 
not). 

Mathematieal procedures arc perhaps a little different than other human procedure in that 
their teleology is quite complex. Ilie complexity is due partly to t'lc fact tliat the procedures 
manipulate a rcpresentatioi. (e.g.. basc-lO numerals) rather than the objects of interest themselves 
(e.g.* numbers). A procedure for making gravy docs not have this problem. Cooks don't 
manipulate representatiunb of flour and 'v**tcn the> manipulate the real stuff. An added complexity 
in arithmetic procedure is ^ay their teleologies merge loops to accomplish several goals at onee 
(VanLchn & Brovin, 1980). llie tcleolo3> of loops is so complex that only recently has At made 
much progress on analyzing it (Waters. 1978, Kieh, 1981). In other task domains than learning 
maJicmatieal procedures* more students might show e\idence of Ideological knowledge, fn the 
prcsem domai*i* it is safe to assume that students knowledge is more like a program than a 
teleology, lliat is, the knowledge is schematic rather than tclcologic (or prototypical), 

3.2 Wliat kind of learning goes on tn t!rc ehssroom? 

The second assumption made by the theory is that students acquire their procedures by 
induction; thcy generalize from c;£amplcs. This section trys to make that assumption seem 
plausible. First, it snows that inductive learning is consistent with the gross features of the students' 
classroom experiences, llien, it presents several other ways that procedures could be learned, and 
casts a little doubt on each of them* 

No one knows precisely wha goes on in elementary school. Unlike college classes, 
elementary school classes are not just lectures and recitations. For much of the day, the child /'vea 
in the school classroom. Many activities that go on there have little to do with learning. Schools 
are like business ofHces in this respect. I>espite the fact that boCi schools and ofHecs have ostensive 
purposes* it is impossible to precisely describe all the activities going on inside their walls. 
However, the gross feaur^ of classroom setting arc uncontroversial. 

In elementary school* math is taught once a day for a little less than an houn usually in the 
morning when children are least restless. The most common instructional activity is seatmrk\ the 
students work exercises in their seats, occasionally asking the teacher for help. Figure 3-2 shows 
how one study of math classes divides instru.tion time. It excludes non-instructional activities such 
as collecting homework, dividing into groups* and dealing with disc' linary problems. The largest 
proportion of instructional time is spent in seatwork. 

t 



Activity Grade 2 Grade 5 

scatworic 5C% 7Q% 

discussion, recital 30% 20% 

lecture, demonstration 1 0% B% 

games 5% S% 

total 100% 100% 



Figure 3-2 

Gross proportions of time spent in various instructional activities. 
Adapted from Ramos-4 data reported in (McDonald & Elias, 1975), 
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Most teac^.crs follow the lesson plans of the textbook rather closely. The contents of the 
teachers textbook can be taken as a rough approximation to the materia) that the teacher actually 
presents. Judging from the textbooks, the calculational procedures that i am calling ^'mathematical 
skills" are only a fraction of the mathematical curriculum. Moreover* they are not taught in a nice 
compart unit as one might expect from -college curricula, The lessons that introduce the 
components of a procedure arc scattered over several years. In the Scott*Foresman textbook, for 
instance, the first lessons on the multicotumn subtniction procedure occur midway through second 
grade, llie various subprocedures, such as traversing columns and borrowing, are introduced in six 
chapters scattered through the last half of the second grade, the third grade* and the first half of the 
fourth grade. A chapter typically has one or two lessons that introduces a new subskiil, several 
lessons reviewing previously taught subskills. and a chapter test During the two years that the 
subtraction procedure is actively taught (it is reviewed for many years thereafter), the students 
cover about 600 pages of text At most 90 pages directly address the subtraction procedure. These ? 
90 pages include not only the lessons introducing new subskills, but also review lessons and tests. ^ 
Page copnts can be translated into a rough measure of the time spent learning subtraction. If one 
puts the school year at about 175 days of usable instruction* and math occupies an hour a day^ then 
it works out that the subtraction procedure is taught in about 50 hours. These ^re just rough 
estimates, of course. The main points arc that the subtraction procedure does not consume much ^ 
instructional time, that most of that time is spent on review, and that the skill in introduced 
gradually over a long period. The same comments apjsly to other calculation skills. Algebra 
equation solving is introduced in the fifUi grade, in the Scott-Foresman textbook series mentioned 
above. By the time the student takes high school algebra, niost of linear equation solving has 
already been presented,* 

^ 3 

An inductive account of skill acquisition requires that the curriculum provides examples in 
appropriate quantities and varieties. Textbooks and teachers provide some worked examples of 
matliCHnatical procedures^ but not all that many. A typical borrow lesson in a textbook m^t print 
two worked examples and 25 exercises. The teacher wUl undoubtedly work through several of the 
exercises on tlie blackboard with the elass and leave some of them on the chalkboard while the 
students do their seatwork. So a lesson might have a half dozen or a dozen examples for the 
students to generalize from. The example set is not as small as one or two* but it is not hundreds 
either. One question that a computational theory can address, in detail^ is whether this moderate 
number of examples has sufficient information for induction to succeed. To the first order* 
however, it seems that the instruction in use todky has enough examples with enough variety to 
make an inductive account of learning plausiole. 



* It Is important to consider the whole curriculum when grounding a learning model on textbook 
lessons. In panicular. It is easy to mistake a review lesson for an introductory lesson. Neves (1931) 
built an Al program^ aubx, that learned how to solve simple algebra equations by induction. Neves 
tested ALEX using examples abstracted ftom a high school algebra textbook, The first algebra lesson 
that Nevos used to test ALEX is probably a review lesson. !t presents three operators for solving 
linear equations, all on one page. One of these operators is taught as early as the fifth grade in the 
Scott-Forcsman series. Neves has ALEX learn these three operators from scratch, as if this lesson 
were introducing them for the first time. I believe this is a mistake that caused Neves to make 
ALEX too powerful to be plausible as a model for the initial acquisition of procedures (see sect 4.3). 
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Figure 3-3 

Ways 10 acquire a program level of description. 



The gross features 0/ classroom life seem consistent wiih an inductive account of learning. 
But itiey are also consistent wiih just about any oiher account of learning because classrooms are a 
rich, almost chaotic environment Since the assumption of inductive learning will soon be used in 
important ways, it is worth casting some doubt on the competing accounts. In order to organize 
them somewhat, well again use the tripanite distinction between specifications, programs and 
actions sequences. If the goal is to construct a description of the procedure at a program 
(schematic) level, there arc four possible routes (see figure ^3): 

1. From specification to program: A kind of learning by doingor discovery. 

2. ,FromexainpIes(action sequences)to programs: induction* _ 

3. From some other schematic description, either . 

(a) another familiar program: learning by analogy, or 

(b) a natural language presentation of the program. 

Each of the four routes has a certain degree of plausibility with respect to the gross features of 
classroom life. Discovery learning would take placs while the students solve problem' alone, 
cogitating over their mistakes. There is ample opportunity to do this in a class where 55% of the 
time is spent doing seatwork. Inductive learning requires examples, which the teacher and the text 
supply. Analogic learning requires the juxtaposition of familiar procedures with the target 
procedures. Modem instruction does some of this by drawing analogies to monetary^ transactions or 
games involving other concrete numerals (e.g., Dienes blocks, Mo^tessori rods, abacci, etc.). 
Learning by understanding natural language presentations of procedures has some support in that 
recipe-like presentations of procedures are occasionally used in textbooks and (presumably) 
classrooms. So all four routes have a certain degree of plausibiEty. 

The next three sections will take three of these four accounts of learning in turn, leaving 
induction aside, and show why each is not a plausible account of the way mathematical procedures, 
are acquired. In the process, several of the pertinent Al studies of procedure acquisition will b^ 
mentioned (for a comprehensive review, see Cohen & Feigenbaum, 1983). By the way, the 
following remarks should not be construed as claims that inductive instniction is the only effective 
kind. Indeed, it may be that other kinds of instruction are so efiective that the few students who 
actually utilize the instruction don't have bugs, and hence there would be no sign of their learning 
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ihc bug data (I doubt this very much). It may be ihat inductfv? instruction is not nearly as good as 
some other method in that decreasing the curricular emphasis on examples would improve students' 
learning. Thcr point is that this is not 3 theor}' about what learning should occur, it is only a theory 
about what learning does occur. 

3.3 learning hy discovery 

In discovery learning, students Icam on their own by solving problems. The teacher has little 
direct involvement except perhaps to suggest projects or problems for the students to tackl" or to 
answer an occasional question. The key assumption of discovery lear;ia^ is that the stude .ts can 
sotvc problems jniiiatly. They may only be able to solve simple problems. They may solve them 
by trial and erron making many counterproductive moves in the piocesscs. Discovery learning 
requires at least this much inidal competence of its participants. By solvmg problems on their own, 
students discover ways to avoid counterproductive moves and ways to solve problems that they 
couldn't solve initially. 

Superficiatly, it appears that discovery learning is common in current mathematical education. 
A typical lesson has exercises that are just a little bit harder than the ones in the pi^eding lesson. 
Such a gradual increase in the^level of difficulty seems just right for encouraging a discovery learner 
to dcquire the new subskill that the lesson teaches. One can imagine, for example, a le$s;T. that 
teaches canrying with exercises such as a: 

a. 3 6 b. 3 6 



Exereise a is just a little harder than 6, a problem type that the hypothetical discovery learner has 
presumably mastered already. The learner would attack a, first generating c. She would recognize 
that the answer of c isnl a proper number so she would fix it, yielding d. 



This results in a procedure with, two passes: one pass adds the columns, the second pass eonverts 
the answer to the canonical form for numbers. Analyzing this^ solution and others tike it may 
evenbiaHy lead the learner to discover that the two passes can be merged into one. This merger 
would generate the normal add'with-cany procedure. Sa it seems that discovery learning is quite 
compatible with the kinds of instruction that today's students receive. However, when one looks a 
little closer, this illusion disappears. 

The key requirement of discovery learning is that students know enough about the task that 
they can solve it initially, albeit in an inefrr''>:it way. This requirement is not a peculiarity of 
people, but an apparently necessary part of '■, information processing. All AI discovery learners* 
have been equipped with substantial initial domaitt knowledge. For instance, LEX is a program that 



* Except Lcnat's AM program, which uses a representation in which relevant task domain 
information was unusally dense (Lcnat & Brown, 1983). By "AI discovery leamei^" I mean 
programs for planning (eg., Sacerdoti, 1977; Green & Barstow. 1975: Stefik, 1980^ strategy 
acquisition (e.g., Hayes-Roth & McDermott, 1976). operalization (e.g., Mostow, 1981), learning by 
doing (e.g., Anderson, 1980; 1982; Anzai, 1979) and learning by debugging (e.g., Sussraan. 1976). 
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discovers how . -efficiently solve simple integrals (Mitchell ct al., 1983). However^ its initial 
knowledge of thv, domain contains a complete set or legal mathejnatical transformations (e.g,. 
integration by parts). This much knowledge is quite a bit o^orc than any beginning calculus student 
has. Another well-known discovery learner is Sussinan*s JJaCKIiR program (Sussman, 1976). It 
learns procedures for stacking toy blocks. Ho^\ever, its it>itial state of knowledge contains a 
"physics'* for the blocks world (e.g., two blocks cannot occupy the same place). This knowledge is 
essential for detecting when the partially learned prtjcedtire has bugs. Without iu n.\CKJ:R would 
not know to revise its procedures. It would not learn. Similar comments apply to almost all other 
Af discovery learners. To put it in terms used before, discovery learners must have a specification 
of the procedure (in some form). 'From that they can derive a program for^the procedure. 

As noted earlier, students of mathematical procedures seem not to possess the teleology of 
their procedures, HVever, without at least the specifications for a procedure, they cannot perform 
discovery learning. For instance/ most young students do not understand the base-lO number 
system well enough to see that the answer tn problem c above is not a legal number. They lack the 
kind of knowledge that HACKER used to detect when its procedure had a bug. Without this kind of 
knowledge, there is no ^ay students can discover the carrying subskill on their own, 

in the arithmetic domain, the essential proljlem is that the specifications for procedures must 
be couched in terms of preserving relationships between numbers, but the procedures manipulate 
basc-10 numeral's. More generally, all mathematical calculations manipulate symbols and not what 
the symbols stand for. Since the symbols do not necessarily obey constraints that preserve their 
semantics, the student must know not to violate these constraints. The symbols will not themsetver 
prevent a student from creating bug|y procciures in the way that Hacker's blocks prevent it from 
creating buggy procedures, * " 

Many educators and cognitive scientists have, tried to find ways to teach mathematical notation 
thct wilt enable mathematical calculations to be learned by discovery. A typical technique involves 
substituting physical objects for the symbols of the notation. Constraints on the notation are tumed 
in^o physical constraints on the objects. For ins^nce, I once tried to use tiiis technique to get 
yDung children to discovei canyiog. The basic idea was to make the principles of the base-10 
sy5*em extremely salient by using an appropriate physical representation for numbers, A two digit 
number was represented by two egg cartons that were trimmed to hold just nine eggs* If the 
* Student tried to put more than nine eggs in the uuits carton, thoy wduld roll ofT the table and break* 
The idea was to convert a tacit constraint of the base-10 system, that the maximum place-value 
holder was nine, into an extremely salient physical constraint. Bctse/ Summers and I tried to coax 
eight beginning ariSimctic students to synthesize carrying Not one. would do it After they were 
shovtn the procedure, they would perform it with no trouble (i.c, they learned it by induction). 
Since the consti'ain'ts defining the task \^ere salient, their fkilure can only be attributed to an 
inability (or perhaps unwillingness) to do the kind of problem solving that discovery learning 
requires. I hasten to add that this experiment should not be taken as definitive. Young subjects 
present difficult methodological problems. By changing the instructions or the^ experimental 
ijiatcriaJs, one can vastly alter the apparent competence of the subjects (c.E Klahr and Robinson's 
study i>f the Tower of Hanoi* 1981, or Gclman and Cfdistrers "itiagic" experiments, 197S). Resnick 
and others have reported niot? methodical experiment^ of thi^kind where some students were able 
to discover carrying, borrowing and similar Subprocedurcs (Resnick, 1982), Nonetheless, the general 
coftSenstv. is that^it is^difficult and time consuming to teach enough about the semantics of the 
jiotatioi. *ihvX svzdcnistm learn calculations by discovery, h seems safe to assume that fitUe or no' 
^- discovery learning occurs in the typfcal classrooin 
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3,4 lA^aming by analogy 

Learning by analogy is the mapping of knowledge from one domain over to the target 
domain, where it is applied to solve problems. Winston (1979) showed that learning procedures by 
analogy could be formalb^d. He constructed a program that could learn to solve Ohm's law 
problems by drawing an analogy to hydn^ulics (sec also Brown, 1977; Carbonell, 19S3a; 1983b). 
Analogies arjf heavily used in the early grades to teach base-\0 numeration. Students are o*ten 
drilled on the mapping between written numerals aBd and various concrete representations of 
numbers, such as collections of coins, Dienncs blocks* Montessori rods and so forth. ITiis is a 
mapping between two kinds of numerals, and not two procedures. Later, this inter^numeral 
mapping is appealed to in teaching carrying and borrowing. For example, a known procedure fbr 
making change — trading a dime for ten pennies — is mapped into the borrowing procedure^of 
v;rittcn subtraction. Since this l\nd of teaching is quite common in the primary grades* it seems 
quite plausible that learning by analogy should be a prominent framework fbr learning procedures. 

Presumably, once an analogy has transferred some knowledge, it is still available fbr use later 
to transfer more knowledge about the procedure. In some a6cs, this predicts significant £*^ident 
competence. For instance, if the student** learned si:iple borrowing via the analogy, then it's quite 
* plausible that when confronted with more complex bojrowing problems, such as 

6 0 7 
- 2 3 8 

(assuming the student hasn't yet been taught how to soWe such borrow across zero problems), the 
student could solve the problem in the concrete domain by trading a dol'^r fbr nine dimes ar \ ten 
pennies, then map back into the written domain, thus producing the correct solution. Indeed, the 
anajogies used in instrucliori may have been designed so that these productive extensions of the 
base analogy are encouragtd. ^ 

But tHis is a much more productive understanding of borrowing than most si'idents achieve. 
As discussed in tlie preceding chapter, when certain studer^ts dixover that it is impossible to 
decrement the zero, they will do local problem solving — repairing their execution state. These 
students do not use analogies to'^familiar procedures (e.g:, making change). ^ If the students had 
learned their procedures via analogy, one would have to make ad hoc stipulations to explain why 
they no longer ^sed that analogy after they had learned the procedure. It's more plausible that they 
simply didn't utilize the analogy in Uie first place. Similar comments ^pply to the analogies 
between arithmetic and algebra. They would predict more algebraic competer:ce than one typically 
fmds. Loosely speaking, learning by analogy is too good. It predicts that students would ''repair'* 
impasses by constructing a correct ^fxtension to their current procedure. That is, they would debug 
instead of repair. Since some students do apparently have repai^generated bugs, another 
explanation' would be needed for how these^students _acquired_their-procedure8. At the very least, 
analogy cannot h<r the only" kiiia of feaming going on, if it happens mucK at all. 

Carbonell (1983) makes a telling argument about analogies between procedures* His ariES 
program was unable to fbrm analogies between certain procedures when aU it had was the program 
(fv^hematic) represcniations. However, Carbonell found that analogies could be forged when the 
procedures were described Ideologically (i.e., in Carboneirs terminology^ the analogy is between 
derivations of procedures)* Suppose one stretches Carbonnell's results a little and claims that 
knowing the teleology (derivation) of procedures is necessary fbr procedural analogy, at least for 
mathematical procedures. (Carbonnell claims only sufficiency, if thai) Since most math students 
are ignorant of the teleology of their pioccdureS (section 3.1), one can conclude that students d'd 
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not acquire their procedures via analogy. 

Analogies are hard io make 

How is it that teachers can present materia) that is spccificdtly designed to encourage learning 
pnjccdurcs by analogy, and yet their students show few signs of doing so? Winston's research 
(Winston, 1979) yields a spcctilativc ansv^er. ft indicates that the moM computation intensive part of 
analogy can be aiscovcring how best to match the piirts of the two sidv3 of the analogy. To solve 
electrical problems given hydratjlic knowledge, one mtist match voltage, electrical current and 
resistance to one each of pressure, water current and pipe si/.c. There are 6 possible matchtngs, and 
Only One matching is correct ; The number of possible matchings rises exponentially with the 
number of parts. For a similar analogy, a best match had to be selected from lit or 40 million 
possible matches. The matching problem of analogy is a version of a well known NP^omplcte 
problem: finding the maximal common subgraph of two digraphs (Hayes-Rotit & McDermott, 
1978). Hence, it is doubtful that a faster solution thiin an exponential one exists. 

If computational complexity can be equated with cognitive difTLCulty, Winston^s work would 
predict that students would fijid it Jifficult to draw an analogy unless either it was a very simple 
one or they were given some help in finding the matching. Resnick (1982) has produced some 
experimental evidence supporting this prediction in the domain of mathematical instruction. 
Resnick interviewed students |who v^^'ere taught addition and subtraction in school, using the us^u l 
analogies between concrete and written numerals. \\ was discovered that some students had 
mastered both the numeral aprbgy and the arithmetic procedures in the concrete domain, and yet 
they could not make a connection between the concrete procedures and the written ones. Resnick 
went on to demonstrate that students could easily make the mapping betv^een the two procedures 
provided that the step? of me two procedures were explicitly paired. That is* the student was 
walked through the concrete procedure in parallel with the written one. A step in one was 
immediately followed by the corresponding slep(s) in the other, ff we assume the conjecture from 
above, that combinatorial ^ explc'Jors in mapping equates with difTiculty for humans making 
analogies, and we assume that "parts" of procedures roughly correspond to steps, tlien Rcsnick's 
finding makes perfect sense, Th^ procedures are currently presented in ischcol in a non-parallel 
mode. This forces students to solve the matching problem^ and most seem unable to do so* 
Consequently, the analogy does little good. Only when the instruction helps the students make the 
matching, as it did in Resnick experiment, does the analogy actually succeed in transferring 
knowledge about one^ procedure to the other In short, analogy could become a major learning 
technique, but current instructional practices must be changed to do so. 

-Example-exercise analogies 

There is anecdotal evidence that analogy is very common, but it is analogy of a very difTerent 
kind !n tutoring* f have watched students flip through the textbook jto locate a worked problem 
that is similar to the one they arc currently trying to solve. They then dra^/ a mapping of some 
kind between the worked problem and their problem that enables them to solve their problem, 
Anderson et al- report the same behavior for students solving geometry problems (1981). Although 
the usage could be drputcd, Anderson et. al. call this kind of example^exercise mapping an analogy. 
It differs from the kind of analogy discussed eariier. The abstraction that is common to the two 
problem solutions is exactly the surface structure (program) of the procedure. In the analogy 
between making change and borrowing, the common. abstraction lay much deeper^ somewhere in 
the teleology of the procedure. To put it differently, the <?xample-excrc(sc analogy maps two action 
sequences of a procedure together, thus illustrating the prxedurc's program. The other analogy 
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maps two distinct procedures together in order to illustrate a common teleology. 

The former mapping, between two instances of a schematic object, is nearly identical to the 
central operation of learning b> examples. In botl* eases, the most specific eommon generalisation 
of the two instances is ealculated. Winston also points out the equivalenee of generalization and 
analogy in such circums^ianees {Winston, 1979). Although 1 ha\e not investigated example- exercise 
analoj5> m dtuul 1 expect it to behave almost indistinguishably from learning by generali^ing 
examples. 

To 'jmmari/.e. one form of analogy (if it could be called that) is indistinguishable from 
induetioP, "l\o other form of analogy seems necessarily to involve the teleology of procedures. 
StrxQ stitde;its show tittle evidenee of teteologv> it is safe to assume that analogic learning is not 
cornrnotj in ciassrooms, perhaps bcutuse current instructional practices aren*t encouraging it in quite 
the right way. 



3,5 Learning by being told <j 

One framework for acquiring a procedure involves following a set of natural language 
instnietions until the procedure is committed to memory. This framework for explaining learning is 
called learning by being told. It views the central problem of learning as one of natural language 
understanding. The key assumption is that the text describes the procedure in enough detail that all 
the students need to do is understand the language, then they will be able to perfonn the 
procedure. 

Manuals of procedures are ubiquitous in adult life. Examples are cookbooks, user guides, 
repair shop manuals and office procedure manuals. In using proce re manuals, adults sometimes 
learn the procedures described therein* and cease to use tlic manuaj;>. So learning by being told is 
probably quite eommon among adults. The content of procedure manuals can be taken as a model 
for how good a natural language description has to be if it is to be effective in teaching the 
procedure. 

Open any arithmetic text, and one immediately sees that it is not much like ^ cookbook or an 
auto repair manual. There is very little text. The books arc mostly ext;rcis;s and worked examples. 
The reason is obvious: since students in the primary grades are just beginning to rea4,- they could 
make little use of an elaborate wri^^n_groce^dure^— — — — ^ ~ 

- - 3Slre (1972) buik an AI progrW that reads the prose and examples of a fourth grade 

arithmetic textbook in order to learn procedures for multicolumn addition and subtraction. Badre 
sought in vain for simple, concise statements of arithmetic procedures that he could use 3$ input to 
his natural language understanding program. He comments: 

During the preliminary work of problem definition, we looked for a textbook that would 
expiain arithmetic operations as a clearly stated sot of rules. The extensive efforts in this 
search led to the following^ somewhat surprising result: nowadays, young American 
grade-school children are never told how to perform addition or subtraction in a general 
way. They are supposed to infei the general algorithms from ex?;iiples. Thus actual 
texts arc usually composed of a scries of short illustrated *storics.' Each story describes 
an ex£»mple of the execution of the addition or the subtraction a'^orittuns. (Dadre, 1972, 
pp. i-2) 

Despite the fact that Badre's program "reads" the textbook s "stories" in order to obtain a 
description of th>c examples, the role of reading in its learning is minimal The heart of the 
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program is generalization or examples. In paniculan the program employs only a Tew heuristics 
that use the books prose to ^disambiguate choices left open by generalization. 

The precedirig par^raphs discussed primary school students learning arithmetic. Algebra 
learners are secondary school students. Many ean rCdd well enough to use procedure majiuals. In 
seeondary school algebra texts, one sometimes finds "nxipes" for solving equations and the like, but 
they arc often too terse and ambiguous to serve as more than a simple reminder. ITieir level of 
detail suggests that sueh written procedures are used as summaries and not as the primary 
exposition. The f^t that most or them are placed at the end or their chapters suggests that the 
textbook writers also see them as playfhg a secondary, summarizing role. If I may add anecdotal 
support from experiences as an algebra tutor. 1 have observed that students who flip through the 
textbook Mooking for help in solving a problem virtually always refer to a worked example rather 
than a reeipe. This is consistent with the view that recipes play an integrative or summarizing role. 
They lack ihe detail to serve either as the main exposition of the procedure or even as a reference 
when additional details are sou^L 

3,6 Summary and formalization 

Tiiis chapter presented two hypotheses. The theory needs them in order to get started in 
arguing competitively. Since there are no independently motivated hypotheses that can be used in 
ai^guing for these two. they can only be justified b)' making them seem plausible. They are- in this 
special sense, assumptions hypotheses without proper support* but ones that the theory bears 
allegiance to. 

One hypothesis is that the knowledge that studervts acquire is schematic (at the level of a 
program) rather than teleologic (at the level of a specification for a program) or prototypical (at the 
level of a set of problem state sequences). All tliree descriptive levels are logically suRicient to 
describe a procedure, However, the behavior of students seems best to fit the hypothesis that their 
descriptions of procedures are schematic. The aigument against prototypical knowledge is that 
students" problcm*solving ability is much more productive, in the sense that they can_solve problems 
that they have never seen before, than_an account of procedural "knowledge that is based on 
_ , prototypes (i.e., memorizedT exainples) would prediCL On the other hand, if students possessed the 
teleology of their procedures, most impasses eould be '^repaired" by deriving a ixjrreet procedure 
(i.e., students would debug Instead of repair). At least some students, the ones with bugs, must be 
Jacking such teleologlcal knowledge. Also, iherc is experimental evidence that some adults have 00 
teleology for their arithmetic procedures. They either never learned it or they fon;ot it in sueh a 
way that the schematic level (program) was retained while the teleology was forgotten. AH in all, it 
is more parsimonious to assume that students learn just the schematic level descriptions for their 
procedures. This implies that student's knowledge can be formalized by something like Usp 
procedure or production systems. It is not necessary to use more powerftjl formalisms such as 
planninf <ets {VanLehn & Brown, 1980), planning calculi (Rich, 1981) or procedural nets 
(Sacerdou, 1977) 

The second assumption is that students learn inductively. They generalize examples. Tliere 
are several less plausible ways that procedures could be learned: (1) Leaming^by-being-told 
expLins procedure acquisition as the conversion of an external natural language information source^ 
e.g., from a procedural manual, into an internal comprehension of the procedure. It is implausible 
In this domain because young students donl read well and older students* textbooks are not 
procedure manuals. iZ) Lcaming^by^analogy is used in current mathematical curricula, but in ways 
that would produce an overly telcological unr'^rstanding of the procedural skills. If students really 
understood the analogies, they wouldn't develop the bugi^ that they do, (3) Discovery learning 
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requires that students have enough initial knowledge of the Cask that they can muddle through to a 
solution. Discovery learning describes how students develop solution procedures from their initial, 
triaI'<ind^error problem solving. However, mathematical Casks have dtiTjcult specifications that make 
it unlikely that a student would blunder into a correct solution of, e.g, a subtraction problem. Even 
when these specifications are made salient, there is some experimental evidence indicating thai the 
initial triaband^error problem solving is too hard for many studentSL Besides, if students did use 
discover) learning, their knowledge of the procedures would be overiy teleological. Of the various 
ways to team pnTcedures^ only induction seems both to fit the facts of classroom life and to account 
fo. the schematic (program) level of knowledge that students appear to acquire* 



Formalizing the hypotheses -usfng-constraints on undefined Junctions 

Two functions, named Learn and Cycle* will be used to formalize the theory. The 
functions will not be defined. Instead, they will gradually acquire meaning as the hypotheses of the 
theory are stated in term of ihem. In the next chapter* for Instance, Learn will be defined in tenns 
of three new undefined functions, and some constraints will be added concerning how the three 
fuhctions interact. In effect, these new functions and consitraints will be a partial definition cf 
Learn. ^ l^ter chapters will introduce further constraints. When all constraints have een made, 
there will still be many ways to define the various ftinctions. Sierra provides one definition for 
each. The intention is that any other definition would do as well as Sierra does at predicting the 
data. 

To put it a little dtfTerently, the endeaver in the following chapters is to accumulate a set of 
semi-formal specifications forSierra, As new empirical facts come to light, new^pecifications-must- 
be imposed on Sierra in order that its performance corresponds to the new facts. To put it baldly, 
the endeaver is to uncover a teleology fbr Sierra, Chapter 2 presented Sierra at the schematie 
(program) level. The remaining chapters build up its teleology. The structure of interlocking 
competitive ai^uments is exactly a teleology fbr Sierra, except that it stops short of the actual code 
itself. It is a teleology Ibr a class of Sierra-equivalents. 

The following is a list of the nomenclature, with comments on their intending meanings. 
Lj..Xi,..L^ A sequence of lessons, o 

L A variable designating a lesson. 

P A variable designating the student's procedure. 

(Examples L) A function that returns the examples c^^itained in its argument, which is a lesson. 
(Exercise L) A function that returns the exercises contained in its argument, which is a lesson. 

(Learn P L) Ajiundefjnedfunction tl^retunisaset of procedures. Its fustaigument, P.isa 
procedure, and its second argument, L, is a lesson. It representsthe various ways 
thai its input procedure can be augmented to assimilate the lesson, * 

S A variable designating the current runtime state. 

(Internal S) A function thatretumsthe internal (exccurjon»orinterpretet)state. 

{External S) A function that returns the external (problem) state. The current state is a 
composite of the internal and external state. 

(Cycle P S) An undefined function thatinputsaprocedureanda runtimestateand outputsa 
set of possible the next states. It represents one cycle in the 
interpretation /execution of the procedure. 
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Defining inductive learning is easy if one can use a consistency eonstraint such *is the following one: 
If L is a lesson* and g is a generalization induced fmm it, then ^ i.^ust be consistent with all the 
examples in /„ In the case of procedures, a procedure ^ is consistent with an example if its solution 
t\> the example's exercise problem is the sjme problem stdte sequence as the example's problem 
state sequenee. 'Iliai is, consistene> meani a student procedure solves llie les.sun*s example exercises 
using Che same writing aetions that the leaeher did* Given a lesson sequence, L^.J,^„L^ the set of 
observable procedures is obtained by chaining. "ITiai i&, procedure is Icarnablc when it is induced 
ftx)m procedure P^^ during lesson and is consistent with lesson L^. and P^^^ is learnable. 1liis 
recursive definition defines the set of learnable procedures to inelude ones that are intermediate 
procedures as well as the procedures that the learners have when they reach the end of the lesson 
sequence. Although it might appear a little complicated, there is nothing special going oa This is 
an ordinary way to formalize incremental induction. The formal hypotheses are: 

Incremental Learning 

Given a lesson sequence /.^.-X^and an initial procedure /*^: 
Procedure /'/isa eoreproccdure if andonly if 
{\)P.~P^ox^ 

[lyP^C (Learn L. f^^) and /^-^^ is core procedure. 
Induction 

If/*-€ (Learn thenforeach exampleproblemjfin (Examples L^), 

the problem state sequence that is P^'s solution to jr is equal to the problem state 
sequence that is the solution to jc used in the example. 

These hy potheses expr ess the secon d assmnption presented in this chapter, that learning is inductive 
in this domain. The first assumption is not as easy to formalize. There ts no standard way to 
formally distinguish between a schematic procedure (program) and a teleologically described 
procedure (a teleology). The best that can be done is to appeal to the intuitive notion of an 
runtime state (notated S). It changes during problem solving while another information structure, 
the procedure (notated P) does not change. Later on, this inability to express the assumption 
formally won*t matter because a formal representation for procedures will have been defined. The 
following principle^ as well as the notation defined above, is aimed at providing a foundation for the 
definitioa of the knowledge representation and its use in problem solving. 

Predictions 

IfS^is the initial state such that (External Sg) isa test exercise, then the set of 
predicted problem state sequence for students with procedure P is exactly the sjt 
{<(External Sq).. (External Sj,)> | Vi € (Cycle P S^.|)}. 

This constraint puts the teeth into the whole theory. It connects observable, testable predictions 
with the predicted procedures. It defines the solution of a problem to be a sequence of runtime 
states S^. Since Cycle is non -deterministic in that it can output more than one runtime state, many 
solution sequences are possible for the same procedure P and the same U;st problem (External 
Sq) . Note that only the sequence of problem states, the projection of the via External, is 
observable. It is just barely worth mentioning that this formalization ducks the minor issues of 
initial and final interna! states. 
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The Disjunction Problenn 



This cWiptcr argues that a key problem which any inducer faces is controlling disjunction. If 
the claos of atl generalizations is specified in such a way that disjunctions arc unconstrained, then an 
inducer will \)t unable to identify which generalization it is being taughc even if it is given an 
infinite number of positive and negative examples. It is only when the inducer is given all possible 
examples that it can succeed This is physically impossible in most interesting domains. Any 
physically realized inducer that can learn successlijlly must be performing induction under some set 
of constraints on disjunctions. In the previous chapter, it was assumed that students learn 
mathematical procedures inductively. Since they don't require inflnitely many examples to do so, 
there must be constraints on the way disjunctions are induced. The task of this chapter is to find 
out what those constraints are. 

Research on machine induction has discovered several methods that solve the ,di^unction 
problem and thus enable mechanical inducers to succeed in flniteiime. For instance, two classic^ 
methods are to bar disjunctions from generalizaoons or to bias the inducer gainst generalizations 
that use disjunctions. These methods, or any methods that solve the disjunction problem, could be 
the one used by people. It is an empirical question which method people actually use. It will be 
shown that the method used by students in this domain is the one-disjunct-per-lesson f^Udty 
condition, which was mentioned in chapter 1. Before arguing in defense of the felicity condition, 
the disjunction problem will be introduced and several solutions will be discussed. 

4.1 Ad introduction to the dlsjimclion problem 

By "di^unction^" 1 mean the following: Suppose that g and are two generalizations from 
the class of all possible generalizations. They each have an extension, where a generalization's 
extension is the set of all possible examples (instances) consistent with the generalization. The 
generalization is a generalization of each objea in its extension and no other objects* A 
generalization's extension is usually an inflnile set Let x and Jc' be the extensions of g and g', 
respectively. The disjunction of g and g' is any generalization whose extension is the union of jc 
and jc'. 

It ts often the case that a representatior-^ langu^e is used to deflne the class of ail possible 
generalizations. If so, di^unction usually co:tesponds to certain operators or constrttcdons in the 
representation language. Disjunction takes two or more generalizations and produces a new one 
such that the extension of the new generalization is exactly the union o^the extensions of the old 
generalizations. For instance, the disjunction of two predicates, ag., (WEDGE x) and (BItICK x), 
is their Ior::al diyunction, (WEDGE x) V (BRjiCK x). The disjunction of two context-ltee 
grammars is a grammar whose rule set is the concatenation of their two rule sets (assuming all non- 
termin^ils except the root have distinct names). 
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Figure 4-1 
Disjunction of two flow charts. 



The disjunction of two procedures A and B is their concatenation plus the new topHevel 
statement: "ff such-and-such is true then call A else call B." To see this in a little more detail. 
Suppose that procedures ate represented as flow charts (see figure 4-l)> Each flow chart has a 
designated node, labeled Enter, To disjoin two flow charts, a conditional branch is placed 
between the two Enter nodes of the two flow charts. This forms a single, new flow chart The 
test (predicate) inside the conditional branch could be arbitrary (i>e„ a random tnie-falso generator) 
or it could be something specific. It doesn't matter as long as the extension of the new flow chart is 
the union of the extensions of the two old flow charts. The extension of a flow chart could be 
considered to be the set of al} action sequences (or equivalently. as the set of all problem state 
sequences. Denotdtional semanrics provides a more general and rigorous treatment of extensions of 
procedures. See Stoy, 1977). 

If the two flow charts were very similar, then the same effect could be achieved by merging 
them. For instance, suppose the two flow charts ^ere identical except for one conditional branch's 
test (see figure 4*2). In one flow chart, the test is P; in the other flow chart, the test is Q> Given 
this similarity, the disjunction is a flow chart w^^ (OR P Q) as ;ts test Disjunction in the 
procedural domain has h rather broad interprctatior it can introduce new control structure or just 
modtfy internal tests and actions. 
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Figure 4-2 
Internal digunction of two flow charts. 



The di^unctum problem 

Induction's trouble occurs when the class of all possible generalizations admits free 
disjunction. That is. the disjuncdon of g with £' is tn the class whenever g and g are. When this is 
the case, induction acquires some strange properties that make it seem unlikely as a form of humaxt 
learning. 

Free use of disjunction allows induction to generate absurdly specific generalizations. One 
such absurdity is the trivially specific generalization: a disjunction whose dtsjuncts arc exactly the 
positive examples that the learner received Thus, if the inducer received positive examples ^ b and 
C then the disjunction (OR a 6 cj is the trivially spcciHc generalization. It has three disjuncts, 
namely the three examples (or rather, complete desciiptions of each example). The trivially specific 
generalization is not really a generalization at all. fts extension is just {a b c}. The inducer didn't 
really generalize, it just remembered. Clearly, this is not the only kind of knowledge acquisition 
that people do. especially students of mathematical skills. There must be some constraints on 
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induction that people have. For instance, they could be biased against u-ivially specific/ 
generalisations. This would take rare of one problemt but there is another problem that is much 
worse. It is the heart of the disjunction problem. 

When disjunctions are unconstrained, the inducer has to \c given the complete extension of 
the generalization being taught before it can reliably discrimil^ate that generalization from the 
others. To see this, first assume that for eacn possible example, there is a generalization in the class 
of all possible generalizations whose extension is that example and only that example. (If this 
assumption is not uoie, then we can reformulate the example space into a space oi equivalence 
classes Such that the examples in a class cannot be distinguished bj generalizations.) For instance, a 
grammar consisting only of the nile S->a is sueh a generalisation, where S is tlie root category and 
^ is a string of terminals. This grammar's extension is the singleton set {a}. Using sueh stngletoL 
generalizations and disjunctiou,,' y^/i//^ set of examples can be described by some generalization. 
To get the generalization for {a, b}, one finds the generalizations for {a} and for {b}, then takes 
their disjtinctioa Since all sets of examples correspond to generalisations, the ind^er can*t tell 
which generalization is conect until it is told exac^y ^^hat the target generalizations extension is< 
This means it must be shown all possibl'* examples, and told which are positive examples and which 
are negative examples. For grammar induction the learner must be shown atl possible finite strings. 
There' are an infinite number of them, so this is an impossible task. (In fact^ for grammar 
induction, it is easy to prove that there are infinitely many grammars consistent with any finite set 
of strings.) / 

The only way for Uie inducer to loam witho\jt receiving infinitely many examples is to bias 
the learner or to constrain the use of disjunction in the representation language. Goodman (1955) 
calls this the old riddle of induction; to leam anything at all. you either have to t- biased or 
partially blind. By hypothesis, students do leam inductively. So it is only a quesLon of whether 
they are. biased, partially blind, or have some otlier way of solving the disjunction problem. 

Prior solutions to (he di^unction problem 

Research in induction has used five major techniques for solving the disjunction problem. 
These will be reviewed briefly (sec Cohen & Feigenbaum, 1983. for details). One technique has 
been mentioned already: disjunction- free induction. Winston's arch-leamer was of this type (see 
section 16). Di^unctive generalizations are shnply 'banned from the class of possible 
generalizations. This technique is the only one of the four that solves the problem by putting 
constraints on generalizations. (The remainder of this subsection is somewhat technical and can be 
skipped.) 

The second "technique" is based on a celebrated theorem of Gold (1967)-^ As it turns out, the 
technique h totally impractical in most cases because it requires that the inducer receive infinitely 
many examples. Gold^s work dealt specifically with inducing grammars, but the results are more 
general. He proved three jmp6rtant theorems: (1) ff a certain brute force inducer receives only 
positive examples, then it cannot leam the target generalization, except when the class of all possible 
generalizations is extremely restricted- (2) If the inducer receives both positive and negative 
examples, tlien it will eventually converge on the correct target generalization. (3) This bnite force 
inducer is equivalent to all other inducers with respect to the theorems' results. Recenlly, certain 
psycholinguists have taken ihese results to mean that it will suffice to explain how language is 
learned if it can be shown that babies receive negative examples (see Pinker, 1979, for a review of 
this position). If they do, then thc> have all the information they need to induce their language, 
and moreover, it is pointless to inquire what kind of induction algorithm they might be using since 
all such algorithms are, in a certain sense, equivalent, lliis position chooses to ignore the fact that 
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GoId*s second theorem requires that the inducer receive all possible examples^ each bearing an 
indication of whether it is a positive example or a negative example. For any interesting 
formulation of inductive learning, this makes the example sequence be infinitely long. In 
concentrating on wiiether babies receive negative examples, the position ignores the physical 
impossibility of an infinite example sequence. Perhaps the infinite set of examples is taken as an 
idealization of a very large set. the set of all sentences that a baby hears while it is learning a 
language. However, the completeness of the example set is used crucially in the proof of Gold's 
theorems. It is easy to produce counterexamples to the theorem when the condition of 
completeness is violated. To put it differently, all that Gold really showed is that his brute foree 
inducer converges on the correct generalization if and only if the example set is complete, llie 
negative examples issue is a red herring, Giv«n a finite example set. the brute force inducer may 
feil even if it docs have negative examples. ITie only way to account for learning is to either (1) 
postulate strong restrictions on the class of all possible generalizations, as Winston did, or (2) to 
postulate a bias, as the remaining three techniques do. 

The third technique uses ajsiasing measure based on extensions. For convenience in stating 
the bias, it allows only one top-level disjunction in its generalizations. That is. a generalization has t 
the form (OR c^ C2... c^) where the disjunctsc, are disjunction- free generalizations. This is not a 
constraint on the class of possible generalizations. It does not decrease the expressive power of tile 
class. It Only puts the generalization in a form that makes it convenient to apply the bias measure. 
Bias is decided by comparing the coverage of individual disjuncts Cj'. Given an example sequence^ 
the coverage of a c^ is the Set of examples in that sequence that it is consistent with. The coverage 
of a C| is the intersection of its extension and the example Sequence. Coverage is used to formalize 
biases. Various biases have been used reflecting varying assumptions about the induction task 
under study. Brown (1972) uses a bias that favors a generalization that has one with as large a 
coverage as possible, along with an arbitrary number of c^ with small coverages. His inductbn task 
involves hypothesis formation over noisy data. The c^ with the largest coverage'is the hypothesis. 
The other c, cover the noise data. In other applications, the learner is biased to expect multiple 
hypotheses of about the same coverage (e.g., Vere, 1978; 1975: Haycs-Roth& McDermott, 1976), In 
this case, the bias is to take as few as possible, each with the largest coverage possible. This bias 
implements one interpretation of Occam^s razor. 

The fourth technique is stochastic. The example set given to the learner has redundant 
examples. That is. an example may occur many times. The inducer^s bias is based on finding a 
generalization that best fits the given example distribution. Generalizations are equipped with 
probabilities that are used to predict the example distribudon. In particular, probabilities are 
assigned to disjunets. Given a di^uncdve concept, (OR Cj C2), a probability P is assigned to 
while \~P is assigned to Cj. ITie inducer's bias depends on a certain computation that calculates 
the likelthood of an example given a generalization that has probabilities assigned to its disjuncts. 
Then, for «xh generalization that the inducer constructs, probabilities are assigned to itsdifjunctsin 
such a way that the likelihood of the example distribution is maximized. The inducer then chooses 
the generalization that yields the maximal likelihood value, Thus, the inducer^s bias is to choose 
generalizations that best predict the distribution of examples. A generalization's likelthood will 
depend On how many disjunctions the generalization has and^wherc they occur, In general, Ihe 
more disjunctions, the easier it is to fit the generalization to the examples, and hence the higher the 
likelihood value. A compensating bias is needed, otherwise the inducer will tend to generate 
trivially specific generalizations. Homing (1969) assigns prior probabilities to the gcneralizadons in 
such a way that generalizations with more disjunctions (i.e., more degrees of freedom) have less 
prior probability. 
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A fifth major technique for .solving the disjunction problem is to rate generalizations with 
some complexity measure (e.g.. count llic number of s>mbols needed to express it, as in Chomsky, 
1975). "Jlie inducer is biased to choose the simplest generalization. "Iliis technique vflW often tend 
to overgenerali7£, especially if there are few or no negative examples. For instance, a grammar 
inducer will tend to chose a grammar for tlie universal language (i.e., all possible fipHe strings) sinee 
universal grammars are often quite simple. Feldinan (1972) balanced the complexity of the 
grammar itself against a second c0mplexjt> measure, one based on the cOmp1exit> of the derivation 
of the example strings from the grammar. For instance, one might measure a derivation's 
complexity by counting the number of parse nodes in the string's parse tree. Another technique is 
to balance the complexity-based bias, which tends to 0vergenera1i7£. ;ainst tlte liketihood-based 
bias, which tends to undergeneralize. / 

In addition to these five major techniques, there are man> heuristic induction algorithms. It is 
often difficult to tell what their biases are. Consequently, they may have limited interest for 
theoretical psychologists, A heuristic inducer has been built for algebra equation solving, a domain 
presently under ^consideration. Neves" program, ALEX, includes procedures for solving algebra 
equations (Neves, 1981). ALEX'S biases are woven into an algorithm for generalizing examples. 
ALEX will be discussed later as a representative for the c\^uss of heuristic approaehcs. 

Competing hypotheses considered in this chapter 

With this background in hand, it is time to eonsider how people solve the disjunetion 
problem. Not all of the alternatives discussed above will be considered for this theory. The 
competition will be between the following five hypotheses: 

1/ 'No disjunctions: Disjunction^ are not induced. Instead, they are there already, implicitly, in the 
set of primitive eoneepts that learners have when instruction begins. This solution to the 
disjunction problem is used by Winston (1975). Mitchell (1982). and others. 

1 Domain 'specifi f^ hf^nrisHrc; N^v'^' prngmm ivill h? diTtn'irfHH nil m liiiij^li' nf^Mrntn^Try^ " 
using ad hoc, domain specifie biases. 

3. * Minima] disjuncts: The learner is biased to take generalizations with the fc^westdisjuncts possible. 

This is the essence of the solution used by Iba (1979). Michalski (1%9, 1975). and others. 

4. Exactly one disjunct per lesson: Given that the sequence of examples is partitioned into lessons, 
the learner acquires one new disjunct (subprocedure) per lesson. 

5. At most one disjunct per lesson: Given diat the sequence of examples ts partitioned into lessons, 
the learner acquires at most one new di^unct (subprocedure) per lesson. That is, if the lesson 
doesn't require that the procedure be given a new disjunction, none will be installed. 

The l^t alternative 1" \A above is the one adopted by the theory. The first two competitors fall 
because they require -uiplausibly strong assumptions about the students' states of knowledge prior to 
instruetion. The fourth solution, introducing exactly one disjunetion per lesson, makes bad 
predictions. The third solution, minimizing disjuncts, is empirically indistinguishable Trom the flAh 
solution, the one taken by the theory, Howevjer, the minimal-disjuncts hypothesis does^not explain 
why lessons help instrtiction, \l predicts^ instead, that students would do just as well without the 
partitioning that lessons give the example sequence. They would learn identically from a sequence 
of examples chopped into hour-long slices at arbitrary plares. Hence, the minimal-disjuncts 
hypothesis is rejected on grounds of explanatory adequacy: ft doe^ not explain why lesson structure 
has been found so universally helpful In education. 
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4.2 Barnng disjunction Trom procedures 

On one view, an inductive learning theory must either restrict the use of digunction in the 
represcntatiun language or bias the inducer against makmg disjunctive generalisations. ITie former 
position is^ litile simpler. Despite the fact that it hasn't a prayer of explaining skill acquisition, it 
v^ill be considered first because it provides an essy introduction to the tacit issues involved in 
controlling disjunetions. 

By analog) with Winston s archJeamer, it is easy toMmagine a disjunction-free representation 
bnguag^ for procedures. Figure 4-3 shows a Winstonian representation for a worked subtraction 
problem. The representation is a semantic net The three nodes labeled a, h and c are the three 
visible Meriting actions of tht example solution. The learner rccogaizes therrras instances of the 
Dirr primitive, where DIFF is an action that takes the column difference and writes it in the 
column^ answer place. Each of the three actions ISA DIFF. The representation uses an AT link to 
record which eolumn the DIFF was taken in. The fJEXT link represents the temporal sequence of 
the actions. The LEFT link represents the rejative positions of th? columns. 
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Figure 4-3 

A worked subtraction e^ferdsc (on the left) represented as a semantie net 
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Winston uses a spcciJ con^irucliun Tor rcprcscniaii^ i^roups o( similar objccls. In this 
example, ll)c actions a b and c arc rcpic^ciiioj as a grutip, I Ins is injitawj b> ihc Uci llial they 
arc parts (via three One-Part-Is hnks) ofUic GROUP nude. v^UkU is kibolcd The group has a 
E>pjral member, labeled tti \^hich ISA DIFF. Winston uses gn>op nodes Tor iterative clock 
structiiros, siich as a coiumn of arbiirarilv man> blocks, ilere a group nvKie is being used to 
represent a loop, a gruiip of arl)ttranl> tnan> Lotuinii piu^essing iiLiioiis. Needless to sa>, more net 
slnicture than that sho^ii in the figure would he nCLessarv to do an adequate job of.represcnEnig 
subtraction's main loup. Ho\^e\er, this simple diagiatii ^i^es enough detail i<j alloWf apprctiatiun o( 
\i\Q fundamental problem with tins disjunctiun-barring approach, 

'Hie fundamental problem becomes <ipparent when a second subtractioiT problem is s^lOwn to 
the learner Figure 4^ illustrates the \ orked exetLiso and the senKintic net for the generah/iition 
that the learner should produce, lo induce tlie u>rrcLt subtrav.tJun pHH:edLirc Us man> students do. 
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Figure 4-4 

A worked Subtraction exercise represented as a seinanoc net showing disjurctive cohimn action. 
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so this is one criterion fur adcquac>), ihc learner lias lu rccugriL.e that the third action is not a 
DIFT. but another primitive action called SHOW. Tlic SHOW action simply copies ihc lop digit of a 
coUinni into the ansiAcr Given that the llnrd action is not a OIFF. the typical iiiember of the 
group (loop) cannot be a 01 FF. It lias to be the disjiinctiun of OIFF and SHOW, which is 
represented hy ihe node SUBlCOL. Iliis kind of generalization is just lAhal the Wjn:>lon's> arch- 
learner does to induce that the lintcl could be any PRISM. Priur to m:>mjcliun, it thinks linteK vierc 
BRICKS. It IS shown ,in arch with a W£OG£ hntol- It niduces that hntols could be any kmd of 
PRISM. 

However, there are major difFerences between the blocks world domain and the domain of 
written calculations, it is plausible that a child might kho^ enough intuitive :>olid geometry to have 
a concept PRISM which di^oms BRICK, WEOGE, and a few other solids. However u is rather 
implausible that a child has a concept StJBlCOL that disjoint exactly the actlon:^ OIFF and SHOW. 
My intuition is that the closest a child would come to such a ''naturally occurring * disjunction 
would be a conccpu call it DOlCOL, that disjoins four actions: DIFF, SHOW. uKing the column 
Slim, and writing the bottom digit of Lhe column as the answer. It would tike an expencnced 
subtraction student to know that only OIFF and SHOW are members of subtraction's loop, and that 
the other actions arc not. Without this experience^ a student could only induce that any OOlCOL is 
okay a:> a column processing action. This vtould generate bugs. 'Ilic following problem illu:>trales 
the misconception that any OOlCOL is okay as a column operation: 

8 4 6 
- 1 2 1 
16 4 

OIFF was used for the units column, the column sum was used for the tens, and the solver wrote 
the bottom digit as the answer for the hundreds column. Debuggy cannot diagnose non- 
dcterministic bugs such as this one, hence no such bug occurs in ui- database. Nonetheless, it 
seems a plausible prediction. Ho cr, the dtsjuncti on-barring approach predicts that all students 
will have this bug. which is clearly false. The language cannot represent the correct procedure 
unless It has SU61C0L. It can*l form SUBICOL with a condition because disjunction is banned, 
lliis approach can only fit the facts if SUBlCOL is known by s^ludents before they take subtraction 
lessons. That is an absurd assumption. Barring disjunctions from the representation language fails. 

It fails because it tends to overgeneralizc. This is just what one would expect, it was shown 
earlier that when induction is allowed to generate arbitrarily many disjunctions, it undergeneralizes, 
yielding cg.n trivially specific generalizations. When induction is allowed to generate no 
disjunctions, it tends to overgencralize. Describing human learning requires finding a middle way 
between free disjunction and dis unction barring. One way is to provide the learner with a set of 
"prefabricated" disjunctions, as Winston did- This is a nativist approach. However, such nativism 
is implausible for the calculation domain. It is absurd to assume if.'A SUBICOL is innate, or even 
that it is learned prior to fonnal instruction in mathematics. 

4.3 Neves* Al^ learner 

Neves program, aLO. induces procedures for solving algebra eauaiions (Neves. 1981). 
Airx s biases are woven into an algorithm for generalising an example. Tlic algorithm has rules 
such a:> "all number constants are generalized b> deleting the proposition whicli states which 
particular number the node is and just leaves the tsa tag that say:> the node is a number " tibid. pg. 
48). A tantalising piece of the algorithm is "If the number of a term is in the condition then tlic 
sign of that term is also put in. There is no good rationale for this other than ii works. The idea 
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liiat the Mgn l^ inipoUjiu pn)babl> dcv^jlops carljcr on i\\ ihc tcxil>vH)k." pg. 48). IT Ncvcs* 
k^onj^LUirc IS riglu, liicn the suidont lias a*.giurcd a iwoiistrarnl m goiKMalj/atu>n. aii c\cJling jmsI^ikc 
Icanunc to learn bctlcn Nc\os' use of domain spaific Luu^tMlnl^ in ihc lKmiI) n<*n'tnnatc 
domain of algcbia ioilk^ilt\ cntaiK that some kind of ^onbtraint a^.quiMti*)Ji must have occuircd, 
L n*ortmia(cl>, lie doesn't investigate the mailer, 

\l ! \ learns \cr> qijickl\ It can acqotre workably gencEal knowledge tif an atgebmic 
transti>miaiion from a single example, iiie tcxibooK Nc\es used [o loaJi ^J i \ algebra was a iiigh 
sthooi fexibin)k, but tlie lessons he chiJse concerned material ihM h otkn taughi m pNmar> school, 
ft l^ i]uiic like!\ ihat lesstms iirc actujl[> rcMCvv lesson^, The lcsson^ iio ihroULili the maierial 
\erv quickh, much too quickh for Sierra, in fact. Al l"A has such fineU umed buses for algebra 
induction thjt H 's aMe lo rc^-o\cr ci>rrect algobra^ ir.msfonnjiions e\cn from the^L■ ab(>revi<iled 
rcMCw less(m<. This leads u> the cunjeciurc thai Al I \ migiit nivike a good model far how people 
nlcam material ihe> used lo kno\^. Perhaps all thev retain is induction biases. Ilie) use these to 
recover tlic procedure, whenever necessary, and forget the procedure itself, 

4,4 i;,vacll> one disjunct per lesson 

A basic idea of step ihcor> is lo convert a difTicuU induction problem, induction witli 
disjunctions, into a series of simple, disjunction-frcc induction problems. Ibe easiest way to make 
ihis notion precise is lu stipiil<Ue that there is cxactl> one disjur^t aai^iicd in cacli lesson. From 
the teachers pomt of view, the sequence of examples is partitioned inti^ lessons so thai each lesson 
exemplifies one, and onl> one, new subproccdure (disjunct). From the learner's puml of view, each 
lesson's examples ^rc to be assimilated via disjunction -free induction, Hovve^er, students have short 
atienlion spans and schools have schedules, A subproccdure might be too tomplicatcd lo complete 
in the hour that is aloucd to lis lesson. So, it ma> be thai some subprocedurcs must be taught in 
several Ic^^ons, in contradiction to the hypothesis, 'Iliis possibility can be swiftly cnecked by 
examining the lessons sequences used in textbooks. 

In most textbooks, there arc lessons that arc clearly intended lu generali/c a subproccdure 
taughi carlicr. rather than introduce a new one, hor example, Ih^ughlon Mifflin's 1981 text 
introduces borrowing on two-column problems, such as a: 

5 4 5 4^ 

a, 6^0 b, 7 6^2 c, 6^6 7 d, 6 5^2 

-23 -436 -263 -367 

27 316 394 196 

The next lesson uses examples, svch as b, vthcrc a [ borrowing is from the tens ci>lumn into the 
ones. Third lesson uses problems like where borro>.^ng is in the left two columns, Ilic fourth 
lesson leaches adjacent borrowing, as in d Other textbooks use similar lesson sequence, McGraw- 
lliH's 1981 textbook scries omits the b lesson, ScoitTWcsman 1975 series compresses b and c into 
one lesson, 

Unoer the hypothesis that tlierc is exactly (me new subproccdure per lesson, each of these 
lessons would start up a nc^ borro^ving subpr<x:cdurc, 'Ilius, the four lessons above would generate 
four born wing procedures, one for each kind of problem. In particular, there would be distinct 
borrowing prixcdurcs for the units column and for the tens column. Iliis would have several 
implications. When borrowing from /.cro is uught, it is alwa>s Unight with three-column problems 
such as e: 
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7 9 4 1 ^ 7 9 

e, 8^0^5 f, 8^0^5 g, 8"0^5 

-217 -217 -297 

5 8 8 8 "8 



Thi^ lessun would lcd%c the bom)\\ing pnKcdiirc for the lens column tolaIl> unaffct-ied. When the 
tcn^ culumn is reached, the problem slaie is as sl.iiv^n in / Nt> burrowing is needed. Indeed, lens 
column bornjvjng ne%er needed when there is a borrowfronivejo (heneefurth, origmaling 
in the units eoiumn (see problem g\ Cuiiscquentl>, the tens-column burrowmg pr^Kedures are 
ne%er jn%oked during tile H! / lesson, ITiere is no reast>n to modif> ihem, (.e^ving them alone 
mjkes a prediction that students vtill impasse if gi\cn problems that require borro\\mg from ^ero in 
the tens column, as h does; 



7 9 7 

h, 8^0*5 4 1, 8^0^5 4 

-2171 -2171 

6883 5283 



Since the tens column be toav doesn't kno^v ho\^ to BFZ, it avJ! attempt to decrement the ^-ero in the 
hundreds, violating a precondition, and reaching an impasse. Repairing this impasse would lead to 
bugs that have never been observed. Problem / shoves the work of one si'ch unobserved bug, 1 
find such bugs implausible, but they are not star bugs. 

However, /m curriculum that I have seen teaches Bl ^ for the tens column in an explicit lesson, 
lliis means that m student will learn the correct procedure, Tlie> will all have bugs such as j. This 
is clearl> a false prediction since man} students eventually master subtracUon, Tliese stadenis must 
have learned subtraction in ways not described b^ the theory (a situation that Occam's razor 
counsels us to avoid) or the original hypothesis that each lesson must start a new subprocedure is 
wrong. 

To sum up; if there must be a new subprocedure per lesson, then there must be several 
distinct borrow subprocedurcs since several lessons are used in teaching simple borrowing. The 
correct algorithm requires that each be amended ,J handle borrowing from zero. Yet only one BFZ 
lesson occurs. Therefore, students should not be able to learn the correct procedure. Yet many do> 
so the hypothesis must be wrong, 

4,5 MinimaMisjuncts vs* one-disjunet-pcrlesson 

This section discusses two solutions to the disjunction problem and contrasts them* 

Always introducing a new subprocedure with a lesson has been shown to be empirically 
inadequate, A somewhat more complicated h>pothesis is that the learner starts a new subproceduie 
for a lesson only if a new subprocedure is needcc' If the lesson's worked example exercises can be 
handled b> generalizing previousl> acquired tnateriaL then a new subprocedure is not added, lliis 
is the solution to the disjunction problem adopted by step theory. It is a felicity condition called 
One 'disjunct' per- lesson, 

Eariier it wa,s shown that barring disjunctions from the representation language forced 
overgenerali/ation. This suggests a bias: allow disjunctions in the representation, but don't use a 
disjunction unless it is absolutely necessary, Tliat is, the inducer prefers generali/*itions that have 
the minimal number of disjuncis, lliis vtill be railed the Fiinimal disjuncis bus. Iliis bias is a 
common technique in induction, Ilic familiar arth learnmg domain vtill be used to illustrate it 
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ibj (1979) used ^1 niuiinijl-dhjtinction biiis u> solw ihc iiah Iciirning prulilciii uithoiil pnor 
kn^n^l^:iigc of PRISM, lie hcgms l)ic inJiiciiun h> giving Ihc learner t^^^) posilnc cxtiinplcs, ihc 
arcti With Ihc BRICK liiiiol and Ihc arch \vjth the WtDGE Iinicl. SintL^ the indikcr docsn'l know tlic 
PRISM concept. It induces that ihe hntel can be aiiv kind of block. It tncrgcneroh/cs. \ba ihcn 
gi\cs Ihc inducer i ncb^uivc example, a tpseudo-)arch \viih a pyramid as ihc Imiel This negative 
example is niaiched b> the current iiKh concept. I his nie^ins th,il ihe current ar^^h is too general, 
Ii shouidn'l malch a negative example. Making ihe ctin^ept mme gcjieral help of ctJiirsc. 

Hie only pt>SvSible response is lo fonn a disjunction. In this case, an appropthile disjunction would 
be to describe tlic hntel as (OR 'BRICK 'WEDGE), 'Ibis illustrates \\hal u means for an inducer 
to make ihe Jh^cht Jtsjuncts possible. Vhc learner onlv inserts a disjunciion \vhen it ha^ to, lliis 
minimal-disjuncts bias is one solulion lo the disjunction problem, 

'llie mam difference betvteen the minimal-disjuncis bias and one-disjuntl-per-lesson is thai the 
minimal-disjuncts bias vtould deictl the beginning of a ne\v subprtxjcdure even jf it vtere m the 
middle of another subprocediire s lesson. That is, one-disjunct-per-lesson is o restriction on the 
niinima!-di^jimction hypothesis. Anything that one-disjunci-pcr-Iesson can induce can olso be 
induced b> the minimal-disjunction bias, but not con\ersel>. Hence, a critical ca?e to look for is 
one \vhcre a subprocedore begins in the middle of anotlicr subproced ore's lesson, lliis would argue 
conclu^iively in favor of the minimal-disjuncts bias- 

Leading zero suppression 

1 know of just one case where it could be argued that a disjunction must be started in the 
middle of a lesson. However, the evidence is rather unclear. It concerns leading zero suppression. 
Mastery of subtraction requires that the student suppress /^ros in ihe answer if the zeros would be 
the leftmost digits. 'Ilie answer to 58-50 is 8 not 08. Hiis subskil! is never given a lesson of its 
own in any of the textbooks that Tve exatrnned. Occasionally, the examples demonstrating another 
siibskiU (e.g., borrowing) will suppress a leading zero. But there are no lessons devoted solely to 
teaching the circumstances under which zeros should be left off the answer, Yet many students 
succeed in learning leading zero suppression. This would seem a rather clear piece of evidence 
against one-diyunct-per-lesson. However, the leading zero stor> is actually quite complex. Only the 
main points can be covered here. 

If the minimal-disjunL*5 bias is to explain the acquisuion of leading zero suppression, the 
textbooks would have to have a wide variet> of leading-zero examples, 'Yhc following examples 
illustrate the kind of vari'^iy needed; 

4 0 4 

a, 6 7 b. 7 6^ c. 1^6 7 d. 6*6 7 

-63 -736 "63 -462 



1 6 9 4 9 5 

0 9 4 
7 6 8 f . 1*0^7 6 64 

766 -99 -647 
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Under the ninimahdisjuncts bias, it doesn^t much matter where these examples occur, but they do 
have to occur somewhere. However, I have yel to see an example suppressing more than one zero 
used in any- subtraction lesson. l-Aamptes of multiple zero suppression, such as o / or Qo not 
appe^ir Despite their absence, some studenLs acquire a complete underst^inding of leading zero 
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suppression. Subtraction is the onl> columnar compulation that generates ans^^ers ^^ith leading 
icTOS. addition and multiplicatiun dc not. Kong di\bK)n orten has subtraction problems that have 
multiple leading /eroi, but onI> as a subpruMcm to tlie v^liole division problem. Perhaps these 
scr\e as examples for learning Suppression ot multiple leading zeros, HoACver, man> students are 
suppressing leading /cms before instructioji »n long di\jsion begins, llie> must have learned the 
skill some other way, hi short, the e\idencc from the lesson data is nc* entirei> clear. It doesn't 
ctearl> support the mmimal-disjuncLs bias s^nce iliai h>pothesis v^ould hd\e just as much trouble as 
one-disjunct-per^lesson in accounting for the acquisition of leading zero suppression, 

I:xplatning why there are lessons 

The strongest support for one*disjunct*pcr- lesson is that it explains wh> curricula are 
constructed they v^a> thc> are, One^disjunct per-l:sson uses the lesson buondaries but the mininial- 
diyunction bias docs not, A minimal-disjun^tion learner vtOu\d learn cquall> well if the partitioning 
imposed b> le^bons ^crc removed, leaving a continuous blream of examples and exercises. To the 
minimal^disjunction learner, lesson structure is irrelevant. 

If lesson structure were irrelevant, then textbooks could be more simpl> laid out as a 
continuous stream of examples, exercises and other material, Ilie teacher vtould use the daily math 
hour to get as far as possible through it. Ihere would bo no lesson boundaries, Ihis is not how 
current (or past) textbooks are structured. Yet wh> have teachers adopted this lesson ^structure 
format so uni\ersall>? It ean hardl> be an accident or a fad. Teachers are dedicated and innovative 
enough that thc> would ha,e dispensed with the straight jacket uf lessons structure if they found it 
ineficctive. To put it difierently, if one accepts the nearl> universal use of lessons as a natural 
phenomenon worth expl<iining, then one-disjunct per lcsson explains it but the minimal disjuncts 
bias doc> not. The one*disjunct*per*lesson h>pothesib has greater explanator> adequac> ih^n the 
minimal^disjuncts hypothesis. 

The minimal disjuncis bias predicts that students would learn equall) well from a ''scrambled"* 
lesson sequence. To form a scrambled lesson sequence, all the examples in an existing lesson 
sequence are randomly ordered then ehopped up into hour-long lessons,* Thus, the lesson 
boundaries falJ at arbitrary pomts. The minmal-disjuncts bias predicts that the bugs that students 
acquire from a scrambled lesson sequence would be the same as the bugs they acquire from the 
unscrambled lesson sequence. This empincal prediction needs checking. If it is false, as 1 am sure 
it is, then the minimal^disjuncts bias can be rejected on empirical as well as explanatory grounds. 

In short, we've arrived via a circuitous route at the felicity conditions thesis. It hold that 
teacher-student communication is a conversation of sorts th.u is governed b> tacit conventions, llie 
conventions facilitate learning. Perhaps it would be fun to close this discussion with a little 
speculation. 



* Randomly ordering the examples would inl^^idce a confounding effect. Examples from late 
lessons could appear before any of the examples of the precedmg lesson. For instance, the subskill 
of borrov'ing-fron^zero could be exemplified before borrow ing-from-iion zero. Here is a 
scrambling without the confound. Suppose that tl*e examples in the first lesson of the original 
sequence are labelled Kl. 12. IX er:, Hlie examples of the second lesson are LI, 2,2, 2,3, etc. The 
other lessons" examples are similarly labelled. 'Die scrambled lesson sequence is; IT, 2.L 3T, 4T, 
etc, for the first lesson; then 1.2, 2,2, 3.2, 4.2, etc, for the second lesson, and so on. 1lie scrambled 
lesson sequence introduces all of the procedures in the first lesson, then reviews it in each of the 
following lessons if the minimal-disjuncts bias holds, this scrambled lesson sequence shield yield 
the same bugs as the unscrambled lesson sequence. 
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SuppiibC unc gucs a btcp fujllici tliun ihc fclJCU> condjuuris thesis jiiJ Lonjcctiircs dial the 
fcljciiy condU'<>ns iJuu exist ore those tlui oplimizv iJic inronnatUMi thinsinissuuu lo sol\c the 
learners disjuncijoa problem, the teacher's optiJiial strategy woulJ t>c lu piniu to a node iii the 
leanicfs knowledge stnjLture and s*i> "disjoin thai node with the following subj>roeodiirc;„„" 
C!carl>. this is impossible. So the teadier ^avs ilic next best Miiug. "Disjoin wmv node \Mth the 

folUmiiig subproccdore " Ihe leainer has to Hgure out \^hiLli node to disjoin because the teacher 

can't point to it. lUit the learner kno\^s now thai some disjuncrion is ne^.esMr> and that iJie 
examples following the leather's command wil! detemiinc its contents (this is the e\act!>-one- 
disjunct per-tesbon h>pothesis that was di%usscil in section 4.4). If k were not foraJie exig^Mictes of 
school scheduling, this would be perhaps ihc optimal informatjon tliat feliLU> conditions could 
transmit, Hov^evcr, lessons have to be about an hour long. Ili^s means that onl\ some of the 
lesst:>n boundaries will correspond lo the teacher's command to start a new disjuiKiion "Hie oiJier 
lessons will finish up ihe previous lesson. In short, the optimal feasible fclicit) condition for 
infonnaiion transmission could well be the one-disjunct-pe^'-lesson bias. 



4.6 Formal hypotheses 

XhQ basic solution lo the disjunctum problem that people use has been uncovered. What 
remains is to express that hypothesis ckarl> and precisely. Iluee fuiictions, named Disjoin, 
Induce and Practice, will be used to fonnahze one-dtsjunci-per-lesson. Wvi functions wnll noi 
be defined. Instead, thc> will gradually acquire meaning as the constraints of step theory are stated 
in tenn of them, 'llie previously undefined function Learn wilt be defined in terms of them. 1lic 
following IS a list of the nomenclature, some of it duplicated Trom the previous chapter, %ith 
comments on iheir intending meanings. 

{Examples L) A fijnction that returns the worked examples contained in its argument, 

which is a lesson 

(Exercise L) A function diat returns the practice exercises contamcd in its argument, 

which is a lesson. 

( Induce P XS) An undefined function that returns a set of procedures. Its first argument, 
P. IS a procedure; and its second argument. XS, is a sequence of examples. 
It represents the various ways that its input procedure can be generalised to 
cover the examples, if there is no way lo generalize the inoul procedure to 
co\er the examples> Induce returns tlie null set. 

(Practice P XS) An undefined function that returns a set of pr(Kedores. its first argument, 
P, is a procedure, and its second argument, XS, i$ a sequence of exercise 
problems, llie output procedures correspond to the various wnys that 'he 
input proceduie can be generalized in order to solve the given problems. 

(Disjoin P XS) An undefined function that returns a set of procedureF> Its first argument, 
P, IS a procedure, and its second argument, XS. is a Sequence of examples. 
It represents the insertion of a new subprocedure (disjunct) into the given 
procedure. Since there are sometimes several ways to do this, it returns 
several dirferenL procedures. 

With these new tenns in hand, the felicity condition can be formally stated. P and L stand for a 
p'^ocedure and a lesson; 
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Onctttsjunchper-iesson 
Lei 

(Learn P L) = 

If (Induce P (Examples L})^t{} then (Learnl P L) 
else (Learn2 P L) . 

where 

(Learnl P L) = 

{ P" I 3 P' such that P' C (Induce P (Examples L)) 
and P" C (Practice P' (Exercises L)) }. 

and 

(Learn2 P L) = 

{ P" 1 3 P' such that P' C (Disjoin P (Examples L)) 
and P" e (Learnl p' L) }. 

Moreover. ( Induce P XS) and ( Practice P XS) do noi introduce into P any new 
disjunctions or any new disjunctson old disjunctions.and (Disjoin P XS) insensinto P 
exactly one new disjunction orone new disjuncton an old disjunction. 

The funaion Learn produces tJie set of procedures that can be acquired from a given lesson and a 
given initial procedure. If the procedure can be gcnerah/.ed without adding disjunctions, then no 
disjunction is introduced (the Learnl case). If there is no such generalization, then a disjunction is 
introduced and the resulting procedure is generalized (the Learn2 case). Learn onl> introduces a 
disjunction if it has to< Learn is defined in terms of the three mam undefined functions. Hence, 
it is just as undefined as tijey are. In fact, the last two clauses use the term ''disjunction," which 
has not been formally defined. However, enof^^h examples of disjunctions have been given that it 
should be clear what is meant even without a formal definition. The formal definition must await 
definition of the representation language used for procedures. 
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Chapter 5 
The Invisible Objects Problenn 



The disjunction problem ts pcrh.ips the most iainims pniblcm iii inditLlJon. A less well 
kn*i\Mi but ci5u*il!> lfiUlmI pnjbicni LoiKcrns \^hat cuuld he called in\iMblc object. An invisible 
ishjat IS soiiiCihing that is iioi present in an exampk- gi\cn lo the nutucer but i> nonetheless 
relevant tu ttio gcncrali/alion bcin^ injuccd. In the domain of mathcin*itiLai Lalculalionx invisible 
objects arc usiijH> numbers, For Mistai ^.c, suppose tlie learner scc^ the teacher write 5 in a\ 



Consider two generalizations that explain the 5. 'llie 5 is the difference of the digits in the units 

column, 7-2, or it is j more complicated cim.jination of \isiblc digits: (4-h2)-l. "llie latter 

requires an in\isibtc t}bject. 6, the result of 4+1 l-.xaniplc b is consistent with the second 
generalization but not with the first. Its invisible object is 8. 

ITie arch learning ta&k provides another illustration of the invisible objects problem. One 
charactcnsuc of an arch js tliac it has a gap right in the middk- of it. Ilie gap must be between the 
1^0 legs, directly under the lintel, and directly on the supporting surface. As Winston points out 
(1975), one way to represent a gap is to use an invisible bnck. Given this representational 
construct, an arch ean be represented as 



(AND (ISA LINTEL 'PRISM) 
(ISA LEGl 'BRICK) 
(ISA LEG2 'BRICK) 
(ISA GAP *BRICK) 
(INVISIBLE GAP) 
(SUPPORTS LEGl LINTEL) 
(SUPPORTS GAP LINTEL) 
(SUPPORTS LEG2 LiNTEL) .,.) 



llie representation uses (INVISIBLE GAP) to indicate that the variable GAP is bound difTerently 
than the other variables when the pattern is matched, GAP can be bound onl> to "invisible bricks" 
^hilc the other variables can be bound onl> to visible bricks. As it turns out. Winston docs not use 
invisible object variables. His repiesentation requires all vanables t*) be bound lo visible objects. 
The relationship (NOT (TOUCHING LEGl LEG2)) is used to express the gap between the arch's 
legs. 

Although an explicit device, such as INVISIBLE, can be used to specif> whether die objects 
bound to a variable are visible or not. a more common represcnlation convention is to use functions 
lo designate invisible objects and variables to designate visible ones. For instance, the arch's gap 
could be expressed using a distance Pmction: 



Ilie output of the distance function is an invisible object, a number. Ilic arch-concept ?itates a 
constraint on this invisible *»bjcct, Lliiit it bo gre*itcr than /en», V'unction> Liin be used wherever 
variables can be used. Under this convention, the only difference between what a variable can 



4 7 
- 1 1 

5 



b. 



5 0 
■ 2 3 

6 



(> (DISTANCE/BETWEEN LEGl LEG2) 0) 
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designate and vthai a ftjnction can designate is thtit the variables referent must be a ^bible object 
(c.L Hempers dcfiniliun of confirmation, 1946). The syntactic distinction betv^cen function and 
variable replaces INVISIBLE as a v^a> to control invisibility. Using functions to control invisibility 
is only a syntactic device. Any nMr> function can be con\crtod lo an (n hlVar> relation, thereby 
allowing a variable to be bound to its output. Sunilarlj. an (n+ l)-ary relation can be converted to 
an n*ary. scfvalued function. In principle, the representation has total freedom to control 
invisibility. - Instead of INVISIBLE, it uses syntax. Ihe net effect is the same. 



The invisible object problem 

Inductions troubles A*ith invisibiiit> come ^sihen the representation allov^s an expression to be 
expanded arbitrahl) by adding constructions that designate invisible objects. Given an cxam|jle, the 
'earner can t see ^hat invisible objects might be involved in the target gencrah^atton. Ilie learner 
may make some educated guesses about which invisible objects are relevant, perhaps, then see if 
the> play the same roles in the second example as they did in the first. IJecause the representation 
allows so many choices, the learner's problem of finding the relevant invisible objects is very hard 
(indeed, it vi^ill be sho\fcn later to be unsolvable). For instance, if Winston alIo\fced invisible bricks, 
then they could be lying around anywhere. ITie learner vi^ould have no way to know if there were 
just one invisible brick, the gap. or dozens lying dbout alt jumbled up. Similarl>. if Winston 
allowed distance ftjnctions and the usual anthmeti'^ functions, then the learner couldn't discriminate 
between 

(> (DISTANCE/BFTWEEN LEG1 LEG2) 0) 

and 

(> (ADD (DISTAHCE/BETWEEN LEG! LEG2) 
(DISTANCE/BETWEEN LEG! LEG2)) 

0) 

The ADD ftjnction introduces a sccend invisible object, which is distinct from the one introduced b> 
DISTANCE/BETWEEK. The learner has no way to know whether or not this new invisible object is 
worthy of description. 

A better illustration of the invisible objects problem is provided by I^ng1c>'s haconJ program 
(Langley. 1979). It induces physical laws given cables of idealij^ed experimental data, I'or instance, 
it can induce the general law for ideal gases when it is given "experiments" such as this one: 

(AND (MOLES 1.0) 

(TEMPERATURE 300.0) 
(PRESSURE 300000.0) 
(VOLUME 0.OOB320)) 

This formal representation describes the experiment in the same way that Winston's representation 
described a sctne in the blocks world (this is not the representation that I)AC0.s3 uses, by the way). 
The expression abo' e says that there is one mole of gas at a certain lempeiature and pressure, 
oaupying a certain volume. The goal of raconJ is to find a description, that is a generalization of 
the experiments that it is given. For this series of experiments, the generalization that it induces is. 
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(AND (MOLES N) 

(TEMPERATURE T) 
(PRESSURE P) 
(VOLUME V) 
(CONSTANT 

{QUOTIENT 

(TIMES P V) 
(TIMES N T)))) 

ITial IS. PV/NT )S a cunsunt. Iliis h one w;i> lo express ih: idcjl gjs U^. \Uiich is more widely 
known js pV^iiRf. wlierc k = 8.3^ In the rcpresenUtion jhma n^nicu that Uie last clause is a 
LOinpi/silJfm uf functions ihal liides Uio iiuermcdjvsic iCbuUs jiid NT. [liebe iiitcrincdjaie rc^^ults 
do not rfppear in the "Vene" described earlier. Iliii^ is what makes Jucox.i'sjob hard BAC0N3*s 
methud fur sulving this indueiion problem is. \er> roughly speaking, lu guess useful invisible objects 
descriptors and enter their \alues in the scenes. It might start b> forming all binary funLtion on the 
visible objects e.g.. NT. P+V. N/N. PP. P/T. etc. SuWc none of these >ielij values (iiuisible objects) 
tlidt are constant across ali the scenes, it trys further compositions. NT/PV. NT^*V, NTPV. etc- At 
this level. It succeeds, since P\//NT turns out if* be the same value. 8.32, in all the scenes. 
Hsbcn .ally. ij\con3 forms the simplest polynomial/that is consistent witli the s».eiics. wlieie "simple" 
is defined computationally by the way that iuco\J organizes its se.irch. Roughly speaking, it 
prefers the polynomial with the fewebi intentiediatc terms (inwsible t)bjai debignaturs). It solves 
the invisible object pioblem by choosing a generah/ation with a minimal number of invisible object 
designators. 

Four potential solutions to the invisibihty problem will be discussed: 

1. Banning invisibihty. The knowledge representation language for mathematical procedures is 
defined so tliat no constructions designate invisible objects. This is the approach taken by 
Winsions (NOT {TOUCHING LEGl LEG2)) solution. 

1 Unbiased mdtictton: Rnough examples are provided to the learner that all invisible object 
designators except the appropriate ones are eventually eliminated. 

3 Minimal invisibility: The learner is biased lu choose generah^ations with the fewest invisible 
object designators (e.g., the fewest funcuons. if functions are what designate invisible objects). 
This IS roughlj what IJAC0N3 docs. (See also Bro\vn s work on inducing kinship relations, 1^72; 
1973.) 

4. Show work: First, the tai^et concept is taught in such a way that all objects that would 
normally be invisible are somehow made visible. Ilien, the learner is re taught the target 
concept, this time with the invisible objects invisible. Ilic learners uisk during the second 
lesson IS only to discover which of the visible object designators that it already knows is now 
being used to designate an invisible object- 

This chtipier will lake each hypothesis in order. Tlie show-work hypothesis will bc^shown to 
engender the best empirical and explanatory adequacy. 
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5.1 Burring invisible objects 

l>ie simplest wa> to htindlc the invisible objects problem is to bar jniisible objev,t Jcsignators 
Trom the representatton langiyge. But this svill not viork Tor the domain or m^ithematical 
procedures. Constructions for designatmg inusible objects are needed so that one cm represent 
pnxredures such as long column addition. Long column addition soKes problems such as 

3 
4 
5 
+6 

I^he ordinary student soKcs this without jotting down intermediate resulLs, llie student keeps a 
running sum mentall). ^his requires some construction m the rcprcscn lotion language that can 
designate invisible objects. namel> the intermediate sums, IlKrcfore, inusible objects cannot be 
barred from the representation language, 

5.2 Unbiased induction with lots of examples 

It is noi hard to see that the invisible object problem is just as unsoUable as the disjunction 
problem. It is unsoKable in the sense that addmg more exjmptcs doesn*t narrow the set of 
consistent generalizations down to a singleton set. In some domains^ one can even prove that it is 
unboKable. Polynomial induction (e,g.* BACONJ) is a classic case that is particularly relevant to the 
domains addressed by this theory. Given a set of numbers pairs* { ,„<Xj. yjX,*}t the task is to 
induce a polynomial function such that flx,) = >j for all l Such functions are generalizations of the 
set of example pairs. This induction task allows invisible objects in the representation. They are 
the intermediate results of the polynomials. A relational represenution of the polynomial funcuon 
y=x^+l is 

(AND (PAIR X y) 

(TIMES Z X X) 
(INVISIBLE Z) 
(PLUS y Z '1)) 

Here Z is used to designate x^, the intermediate result of x^+ L Since it docs not appear in the 
example pair, it must be marked INVISIBLE, 

If intermediate results (i,e,* invisible numbers) wore barred from generalisations 
polynomial functions)^ then the problem of inducing polynomials from bets of pairs would be trivial^ 
When invisible numbers art allowed, it is unsolvable. That is, given any finite set of pairs* there are 
infinitely many polynomial functions tha: generali^^e them. Proof: If there arc n pairs, then tliere is 
always an n-f degree polynomial that fits them. An n degree polynomial could fit the n pairs plus 
another pair, chosen randomly. Since there are an infinite number of possible extra pairs, there are 
an infinite number of n degree polynomials that will fit the n pairs, Q,RD, 

To pick an illustration closer to home* consider inducing the function nest that provides the 
answer to the tens columns in two column subtraction problems^ such as* 
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a, 72 b. ''A c,36 

- 4 1 - 2 1 - 1 2 

3 5 5 3 2 4 

Looking at a and b, an inducer might form the following gene rali/a lions; 

2. T, + iJi 

3. A,o = (rr,o+'l'j)-([iio + »|))-Ai 

\^hcrc the Subscripts indicate the column, and 1\ B and A stand fur the lopx bottom and answer. 
Ilio fir:>t gencrati/jtion is tlie correct one. ITie second generah/jtiijn is that tlie ten\ answer is the 
sum of the unLt:> columns' digit:>. IliJh second gencrali/jtioti. altliough consistent wJth examples a 
and Is inconsistent wUli Mdn> such accidental generalizations ca*i be eliminsited by gi\ing lots 
of examples. Howe\er, genoralJ/ation 3 cannot be eliminated. It will be tme of any subtraction 
problem. 'Plus shows that there are some absurd generalizations^ generaliz-ations that students would 
neui make, that wou!d Sur\ne mductiun even o\er an inHnite number of examples. Students must 
be applying other constraints to the induction process to climjiiatc this generalization, and many 
others like it. 

5.3 Minimal number of invisible object designators 

A close consideration of long column addition supports the idea that students might be biased 
to use as few invisible objects as possible. Students are introduced to long column addition with 
problems that have juSt three numbers to add. Given an example such as 

3 
4 
+1 
8 

there are many ways to generate 8 from 3> 4 and L Each requias various intermediate results. 
Some possibilities are: 



concept Number of intermediate results 

4+4 0 

3+4+1 1 

4x3-3-1 2 

4^-32 + 1 3 



Most of these potential generalizations will be eliminated by other examples. However, there will 
always be many lefU aS shown in the preceding section. Unbiased induction will not tell the learner 
which generalization to learn. In particular some students learn long column addition correctly, so 
they must be uSing some bias to choose among tlie many generalizations that are consistent with the 
examples. If the learner is biased to pick the generalization with the fewest intermediate results, the 
correct algorithm will be acquired. 

There are many explanations one could give for why a 'earner might have such a bias. The 
generalizations with the fewest invisible objects are also the ones with the fewest number fact 
ftjnclions. U could be that the students are biased to choose short calculations because such 
calculations are the easiest ones to perform. On the other hand, the students could also be biased 
to reduce their short-term memory load. \hc generalizations wiLh the fewest invisible objects are also 
the procedures requiting the least use of short-term memory. ITieSe explanations are plausible. 
Jn fortunately, their predictions are indistinguishable from expressing the bias as a bias against 
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invisible object^;. The learning data provides no way to split them. Until other data arc collected, it 
is a moot point ^^hethcr the measure being minimj/cd is fact function toad memory load* or 
invisible objects. 

An expenmeni vnth the minimal'Something bias 

Sierra »as origmally iinplementcd to have the bias just discussed. On the first example of a 
lesson* It ttOu)d find all the fact ftinction paths between the \isiblc numbers of the example (subject 
to an ad hoc upper bound on path length). Each example after the first ^^ou!d remo\e paths that 
were inconsistent with its visible numbers. At the end of the lesson* Sierra would find and I^eep all 
the minimal length paihs. ITiesc paths were (a) consistent with all the examples, and (b) of minimal 
length, i'hese paths were the generalizations that Sierra generated as its predictions for what human 
learners would choose. Sierra ^^as able to Icarn correct subtraction and many subtraction bugs using 
this bias. Ironically* long column addition* the procedure that provided the original motivation for 
inducing invisible objects* also proved to be its undoing. 

Sierra*s problem with long column addition vtas in forming the recursive loop that would 
allow It to sohe problems with arbitrarily long columns. Given two-digit additions problems* it 
would form one action* roughly (Write A (Add T B)).). Given the next lesson* with triple-digit 
problems. Sierra would form a second subproccdurc, yielding a ne.v procedure that could be 
roughly expressed as 

(If <trip1e-digit> 

then (Write A (Add T (Add M B))) 
else (Write A (Add T B))) 

where T* H and B refer to the top* middle and bottom digits of a triple-digit column. The clue that 
something is wrong is that Sierra did not use its knowledge of two-digit addition to help it leam 
three-digit addition. There is not use of the two-digit addition embedded in rJie triple^di^it 
addition. Sierra developed the triple-digit function nest from scratch. However, because its bias 
was lenient about invisible objects, it had no difficulty inducing the nested Add functions. Given 
the next lesson* with four-digit columns. Sierra ^ain added a new subproccdure, yielding a 
procedure that could be roughly (Expressed as 

(Tf <four-digit> 

then (Write A (Add T (Add TM (Add BM B)))) 
elself <triple-digi t> 

then (Write A (Add T (Add M B))) 

else (Write A (Add T B)) 

Sierra might have formed a recursion at this point* but it did not. Hence, the procedure it learned 
IS unable do a five digit column. But a human learner would. I expect* be able to solve a five-digit 
column after this much tutelage {1 have no data on long column addition). The reason Sierra did 
not form the loop is that it couldnH recognize the three digit problem hiding in the midst of the 
four-digit problem. Parsing of problems pays special attention to boundaries. The boundary-bias 
must be presen^ in order for Sierra to generate several key subtraction bugs (see the discussion of 
Always-Borrow-Left. section 1.1). 1 think it would recognize the recursion given slightly longer 
columns* but this is difficult to test (the set of possible paths gets too big for the computer's address 
space when the paths are long). Since Sierra did not find the recursion until after four digit 
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problems wcic presented, it bcc.;mc cruci*i1 tu find out where m the 1esv>n sequence human 
studenb arc first expected tu funn the recursiuiK .it three-digit, foui-djgit ur fke-digit problems? If 
Sierra \^js j goud model uf ljuinan learning, theji humtiii studciUh \Mk*ld need lv)[jger pri^bleiiis than 
four digit ones io learn the Itiop, A second grade teAtbool; \vas piialiased Something that should 
have been done lung befijre), ITii^ led to the discovery of the shov^*work priiieipic, 

5,4 Show work 

In almost alt Cdses, textbooks do not require the ^tuden* ,ti du invisible object induction. 
Instead, v^hene\er the text needs to introduce a siibskill that hus a mentally held jiienncdiale result, 
jl uses tv^o lessons. Hie first introduces the siibskill using special, ad hoc noiations to indicate the 
intermediate rcsuUs. Kigures 5-1 and 5-2 shuv^ some examples. Since the intermediate rc:iilts arc 
v^rnien out in the first lesson, the students need guess nvj in\jsible objects m order to acquire the 
subskill. The Icarnmg of this lesson may ptocecd as if in\jsible i)bjcct designators were banned 
from the representation language, 

'llic second lesson teaches the subskill again, without writing the intermediate results. The 
second lessi^^n is almost alv^ays headed by the key phrase, '*Hcrc is a sht>rter way to X*' where X is 
the name of the skil!, 'ITlc students are being instructed thai they \m11 be duing exactly tiic same 
\^ork (i_c., the same path of fact funcuons), IliC) arc left \Mth the relati\cb simple problem of 
figuring out how the new material relates to the material tiiey learned just the day before. This 
Kind of learning might be called opumtzauon learning. It is similar to induction. Indeed, I believe 
Sierra ci jld be casil> modified to handle optimization learning, lh)wc\cr subtrav^tion curricula 
ha\e no optimization !cssi)ns. (They would if teachers taught students to suppress scratch marks, 
^ , most do not these days,) Without instances of optimization learning, the bug data will not help 
in discovering what is the right way to formulate such learning. Optimization learning remains d 
topic for future investigation. 

These considerations motivate the foilowing hypothesis: 

Sko\K'-work 

In worked examples of a lesson, all objects mentioned by the new subproccdurc arc 
visible, unless the lesson is marked as an optimization lesson* 

*rhis hypothesis is not as formal as others, although its intended meaning is clear, I^tcr, its formal 
impact will be built into the knowledge representation language* Essentially, functions will be 
prohibited in certain areas of the representation and strongly limited in others. *rhc details, which 
depend on the representation's syntax, arc deferred until section 15*1, 

The show work hypothesis is quite clearly a felicity condition* Neither the teacher nor the 
student must obey it* Yet when they do, it is easier to transmit information, !n Sierra, the 
combinatorics of collecting ftjnction nests can be almost entirely avoided* Presumably, human 
[earners may also find learning easier* 
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Figure 5-1 

Three formats for column addition obeying the show work principle. 
Exercises appear unsolved on the left, solved on the right 
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Figure 5-2 

Other exercise formats obeying the show work principle. 
Exercises appear in normal format on jet in show-work format on right. 
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5.5 Inusibic objects, disjunctions, and Occam's Razor 

Occam s Ra^or is usually given a twofold interpretation. Wcbstcr'i dictionar> sa>s Occain's 
Razor ''is interpreted as requiring tliat the simplest of competing tlicories be preferred to the more 
complex or that explanations of unknown phenomena be sought first in terms of known quantities." 
If ^'simplest*' means fewest disjunctions, then step thcor> claims that learneis obe> the first dictum 
of Occam'i Ra^or. ] his was discussed in the preceding chapter. This chapter could be construed as 
showing that learners also obey the second dictum. For instance, if the learner seeks to explain 
where ihe teacher got the 6 from in the example 

9 
" 3 
6 

then Occam's Razor advises explaining it as 9-3, a function of known (i.e.* visible) entities ralher 
than some unknown (invisible) entity, 3.g.. the sum of numbers less than three, the student's age, 
the phase of the moon, etc. As Occam's Razor suggests, the invisible objects problem is a general 
problem* one that concerns almost an> inductive account of knowledge acquisition. lt*s importance 
is highlighted b> the apparent fact tha^ teachers and learners have a special con\ention for solving 
it, the show-work felicity condition. 

The invisible object problem and the disjunction problem are similar in man> respects. Both 
can be solved trivially by barring their respective representational de\iceSi This is not an option in 
this domain because mathematics procedures use both disjunctions and invisible objects. Both the 
invisible object problem and the di^unction problem are unsclvable by unbiased induction. If the 
class of all possible generalizations allows free use of them* then there are infinitely many 
generalizations consistent wiih any finite set of examples. Hence* both the disjunction problem and 
the invisible objec; problem require biased induction, fn both cases, an empiricall> plausible bias is 
based on minimizing the uses of the respective devices (i.e., induction prefers generalizations with 
the fewest disjuncts and the fewest invisible object designators). However, these biases do not 
explain why lessons have the format that ihey do tia\e. Better hypotheses are based on the idea of 
felicity conditions, conventions *that make learning easier. The felicity condition hypotheses not 
only fit the facts, they also explain lesson formats as conventions for facilitating knowledge 
communication. They have the same observational adcquac> as Jie minimization based hypotheses, 
but ihey have more explanatory adequacy. They actually tell i:s something about why that 
mammoth cognitive-calture artifact — our educational system — has ihe properties that it does. 
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In the preceding chapters, the Tocus on explaining learning. It was Tound that inductive 
learning could explain the gross features of student learning, provided that t\^o fclicit> condiuons 
\fcere included in the explanation. Hovicver, there are two distinct but symbiotic foti of empirical 
curiousu> to this investigation. Finding out how students learn is one. the other is finding out what 
cau^:^ ?hem to have bugs. In this chapter und the next, the emphasis will be on explaining bugs, 
lliis chapter will introduce some bugs and bug migrations that will be referred to throughout this 
document 

Given the show work felifit> condition and ihc one- disjunct- per- lesson felicity condition, 
inductive learning will converge. Given sufficient examples, an inducer will construct a large set of 
procedures. Ml the procedures will by definition, be consistent with all the instructional examples. 
However, most \^i11 be buggy procedures instead of correct procedures. One cause is 
overgenerali/ation. An example of uvergenerali/ation was described m section 2.7. Sierra's learner 
was given examples illustrating how to borrow from iqto (henceforth. BFZ will be used to 
abbreviate "borrow from zero"). However, the learner ovcrgeneralized the condition for executing 
the BFZ subprocedure, generating a bug that performs the BFZ subprocedure both for iqw and for 
one (i.e.. for identity elements). Iliis is just one example of how iearnmg can generate bugs. 

This chapter argues that learning, and overgeneralization in particular, is a vei'y powerliil bug 
generator It can. in principle, generate any conceivable bug. It is almost irrefutable. Constraints 
must be placed upon it if it is to have any explanatory value. But certain bugs are very difficult to 
generate if such constraints are placed on learning. In order to make explanatory, constrained 
learning empirically adequate, these bugs must be generated by another mechanism. '\hc proposed 
generative source is local problem solving. To put it differently, this chapter begins by contrasting 
two positions: 

1. an unconstrained learning theory, and 

2, a constrained learning theory plus local problem solving. 

Both can generate many bugs. However, when learning is used done, it must be given so much 
flexibility that ii can no longer explain why certain bugs are observed and not others, ft has less 
explanatory adequacy. 

6.] Explaining bugs with Overgenerali/ation 

A simple learning framework bases explanations on overgeneralization. It explains errors as 
resulting from correct induction from impoverished sets of examples. For instance, the bug DifF" 
0"*N = N, whose work appears in a: 
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4 

a, 6 0 6 6 6^0 

-16 "10 -16 

4 6 4 6 3 4 

is explained as an uvergeneralization of the correct rule N-O-N. The learner has seen examples 
such as h but not examples such as c Hence, the learner induces thai N-0 = 0-N:=N, which is 
perfcclb cun&isienl v^ilh ihc instruction received far. The o\crgeneiali/auon framework is simple 
because il does not postulate "mislearnjng'' as a source of errors. Instead, all learned concepts are 
cunsiMenl with the examples. Bugs arise on1> from o\ergenera1ization, possib!> in the context of 
incomplete instruction. 

Simple o\ergenero1izauon is surprisingb powerful. The theorist can explain very diverse bugs 
by using \l It can e\en be used to explain bug migration, as Derek Sleeman has pointed out 
(Slceman, submitted for publication). Before his suggestion is examined, an introduction to bug 
migration is in order. 



Bug migraiion 

Bugs are not usually stable. It is uncommon for a student to ha\e exactly the same bugs on 
two tests, even if those tests are given onJy a day apart (see section 2,10), For instance, a student 
migJu have bugs A and B on Monday, but or Wednesday, the student has bugs A and C instead, 
Bug A was stable, but bug B was replaced by bug C This is a kind of //i/e/^test bug instability, 
Intcr-test bug insubility is the i.orm rather than the exception. Only 4% of the bugs remained stable 
in one study (VanLohn. 1981), Bunderson (1981) reports no stable bugs at all. 

There is also i/i//vrtest instability. The bugs appear and disappear over the course of one 
testing sc' n, A student may have bugs A and B on the first third of the test, bugs A and C on 
the second third, then just bug A on the last third. 

One interesting kind of data is patterns of bug instability, and in particular, which bugs 
alternate with each other, as B and C did in the preceding illustrations. Many of the observed 
alterations will be spurious, B just happens to disappear at about the same time that C appears, 
ITiey may have no interesting relationship to each other However some of the observed 
alternations seem highly significant Not only do the bugs involved seem related intuitivelyi but the 
same groups of alternating bugs appear much more frequently than chance would predict These 
significant alternations are termed bug mi$raiions> A set of bugs that migrate into each other is 
called a bug migraiion class. Thus B migrates with C in the inter'tcst example above* hence {B, C) 
IS a bug migration class. In the intra tcst example. B alternated with C> but both bugs were absent 
on the last third of the test. It is quite common for bugs to migrate with a correct version of the 
procedure. To design'^.te this, a null is used in the bug migration class; {B> Q 0), 

Ab in the diagnosis of bugs, the diagnosis of bug migration requires corefol analytical methods 
in urder tu guard against false positives; mistaken claims that a ceruin bug or bug migration class 
exists when in fact the cause of the observed behavior is jubt a chance alignment of unintentional 
errors (slips). Although the analytical methodology for bugs is quite highly developed, the 
equivalent technologic ial development for bug migration has just begun. 
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Explaining bug migrations with overgenemliration 

\o return lo Slccman's point: certain cases of bug migration may be caused by 
t \cr&cnorjli/ation. To explain an obs^erved bug migration class* {11. C}, one posluldtcs that the 
sludcnl 1i*is d gcncfiili/^cd bug such Lho W(i>i> to insunltate that generali/.cd bug include the 
obser\cd bugs B and C For instance, earlier we saw that overgeiierali^ation yields a rjle: 

If there isa^ero in the column, write the other digit in the answer, 

which led ultimately to the bug DifF-O-N^N, A further generahzation is the nile: 

If there is a zero in the column, write one of the digits in the answers, 

lliib rule predicts an obsened bug migration, llie bug migration class contains t\^o bugs. The first 
bug. Difr-0-N = 0, solves problems as in a 

a, 6 0 b, 6 0 

4 0 4 6 

It answers 0- N columns with zero, lliis bug results from instantiating the general rule by always 
Uiking the columns top digit for the answer, llie second bug, Diff-O- N = N, whose w^.Tk appears 
in b, results from instantiating the general rule another way, by always taking the colurm's bonom 
digit, llie bug migration class is {Difr-0-N=0, Diff-O-N-N}, This bug migration is rather 
common. Figure 6-1 shows one student who exhibits it On the first lest, which was taken on a 
Monday, the student has two bugs DifF-O-N-O and Borrow- Across Zero, The later bug doesn't 
concern us here, (It affects problems \ and i/,) In all 0-N columns except one, the student 
answers with 0, In the exception column, problem a the student did 0-N = N, This is a rather 
ske\^ed example of intra test bug migration. On the second test, taken two days *aier, the student 
still had the bug Borrow Across- Zero, but now the student migrates freely between DifF-O — N = 0 
(problems L n, p, L and u) and DjrF-0-N=N (problems k rn and No instruction in 
Subtraction was gi\en between the two tests. The bug migrauon is apparently a product of some 
earlier experience. Ove [generalization offers one explanation. 

The same generalised nle predicts a bug migration that involves exercises where there is a 
zero in the bottom of a columti. For N-0 columns, either the top or the bottom digit is written as 
the answer Th* migration is between Difr-N-0 = N and DifF-N-OsO. The first instantiation, 
N-O-N. is correct. So this actually predicts an intennitlent bug, i,a, the bug migration class is 
{Difr-N-0=:0, 0}, This bug migration has also been observed. 

It seems that Sleeman's idea has some merit, Overgeneralization provides reasonable 
explanations fur certain bug migrations as well as for the existence of certain bugs, llie next 
section pushes farther 
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a 645 
45 


b 


885 
- 205 


c 


7 

8^3 
- 44 


d 8305 
3 


e 


50 
- 23 


602 




680 




39 


830 2 




30 


\ 

f 562 
3 

559 


& 


3 

7 A 
- 136 
606 


h 


1 06 
- 70 
1 06 


0, 

i 7tB 
- 598 
208 


j 


O^^'lSi 
rg64 

- 887 

677 


5 1418 

- 2697 
3844 


m 


3% 
- 21 4 
1 07 


n 


"l 
1 8^3 

- 21 5 

1 608 


0 1 

0 ro2 
- 39 

33 


P 


9007 
-6880 
3007 


4o5r% 

- 607 
4008 


r 


7^2 
■ 1 08 
504 


s 


2006 
42 
2004 


t 1 0 0^t^2 
- 21 4 
10008 


u 


gooh 

* 43 
7008 



645 
602 



885 
205 
680 



83 
44 
39 



8305 

3 

8302 



50 
23 
30 



56^ 

3 

559 



1 36 



606 



106 
- 70 
1 76 



7t^6 
208 



ys6M 

- 887 
1677 



51418, 
0S9T 
2697 
3844 



m 



3rh 

21 4 
1 1 7 



181^ 
21 5 
1608 



;fO^ 
- 39 
033 



9007 
■6880 
3007 



4 0r^ 
607 
4608 



702 
1 08 



504 



2006 
• 42 
2004 



1 oor^2 

- 21 4 
1 0008 



^ Figure 6-1 

Solution to two identical tests by student 1 ofclassroom 34, 
First testis above the line, second test is below the line. 
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6.2 Stretching 0\crgoncrali^alion lo nccounl for ccTmin bug? 

OvcrgencraliAJtion a very pov^crful concept that cin generate man> hu^s. On ihc ulticr 
hand, the collection of ubscr\cd bugs )s very diverse. It is not clcjr whether the duersJEy q( ihc 
bugs ttill oefcdl the genenEi\C power of t)\ergenerali/*aion. So Tar. u\crgeiicf ah/at ion lias been 
applied in hiTiued ^^ays. The iillostraUons apphed it lo classes oT numbers (i.e.. boFro\\( from zero 
^^as o\ergenerah/ed lo borrovt^from tdenttf) elcmam) and to knTations (i.e.. zero in *.he top digil 
became zero anywhere in the column). Ilns nscction lejds ofT by discussing a bug niigralion cLiss 
Uiat acquires overgenerah/ation to aci in new ways. 1lie bug migraUon chiss lias ihree biigs: 

{ Borrow-Across- Zero* 
Stops'lJorrow-At-Zero, 

Smaller-h'rom "Larger- 1 nslead-oMjorrow-From-Zcro} 

Iliese bugs are each fairly important bugs in ihat they often occur m compclilue arguments Liter in 
tills d^Kumenl* "Hiey will be presented in some detail. An overgenerjli/alion b.ised explanation 
wii! be given for each. The first bugs explanation is fairly smt)oth. llie second is a little rougher. 
By the last one* overgcnerali/ation will have been stretched to the breaking point 

I he first bug in the class is Borrow -Across* Zero. "Iliis bug also cannot borrow from zeros. 
When It encounters a BFZ situation* it locates a nearby non /ero digit in the lop row and 
decrements that instead. Figure 6-2 gives a problem stale sequence illustrating it. Ilie bug does its 
relocated decrement between states a and b. It docs the rest of the problem correctly. Ilie rather 
curious arrangement uf decremented digits m the hundreds column is the hallmark of this bug* 

To account for this bug with ovcrgencralizalion is not too hard. One postulates that the 
student has only seen non-iero borrowing. That is. the student has seen 52-19 and 511-99, but 
not 501-99. The student has induced that the **the digit to decrement is the closest top-row digit 
Jiat IS non-iero.** This locative description is consistent with all the examples the student has 
received, it seems a little bit strange that the Overgcncraii/ation should mention ^ero despite the 
fact that the student has never seen a BFZ exercise. To justify its inclusion m the descnption, one 
\^ould have to postulate that zero is so salient to the learner that its presence or absence is always 
recorded in generalizations* This is perhaps not implausible. 



a.6 0 7 
- 2 8 



4 

0 7 
Z 8 



Z 8 



4 17 
Z 8 



e* 



/ 17 
Z 8 



/ 1017 

f >;r7 

- 2 8 



/ 1017 

: 28 

3 8 9 



Figure 6-2 

Problem state sequence for the bug Borrow- Across-Zero. 
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17 17 4 17 4 1017 4 1017 4 1017 

a.5 0 7 b. 5 0/ c. 5 ± ^0^ ^JiT f, JBTj^/T g. ^ja^r 

- 2 8 ' 2 8 ' 2 8 2_8 Z 11 Z H Z 2_8 

9 9 9 89 389 
Figure 6-3 

Problem suite sequence Tor the bug Stops Borrow-At-Zero- 

ITie next bug in the bug migration class is £>tops-Borrow'At-Zero (this bug was mentioned 
carhcr in section 19). When Stops- Borrow- At- Zero borrows from zero* it doesn'l .overwrite the 
:,crj, but skrps Lhc dccrcricril opciaiioii entirely. Figure 6-3 shows a probiem state sequence for 
this bug* The skipped borrow from is evident at problem slate The bug has already done the 
second step of borrosving* bonow^into. ITie rest of the solution is correct. The missing daremcnt 
lb its onb flavv. The missing decrement can be accounted for vtith overgcncralization b> postulating 
ihjt the btudent believes that "decremenung zero is null stuff/* Perhaps the student justifies this by 
thinking, "ff 1 have no apples^ and you try to lake one, nothing happens" This generalized 
decrement operation accounts for Stops- Borrow- A t-Zero, 

A more difficuU fact to explain is that these bugs migrate v^'iih Borrow -Across- Zero. In fact, 
the migration between Borrow ^Across- Zero and Stops-Borrow -At^Zero is one of the most common 
bug migrations observed. To account for the migration* a generalization must be found that unifies 
the two generalizations: ^ 

L Decrement the left acjjacent. top-row digit, where decrementing zero is null stuff. 

2. The digit to decrement is the first non-zero, top^row digit. 
It would be simple just to di^oin these two generalizations. The student would believe that 
burrowing' from is either a null-stuff decrement or a decrement to a nearby digit. However, 
inducing disjunctive concepts is ruled out by the one-disjunct-per- lesson hypothesis. Hence, the 
disjoined concept must be present before instruction begins. A fairly exotic concept meaning 
"decrement zero is null stuff or nearby stufT' would have to be available during the induction of 
borrowing, perhaps by being in tiie base of primitive concepts. 

The last bug in the bug migration class is Smaller- From-Larger- Instead -of- Borrow-From -Zero, 
Like the other bugSi it solves simple borrowing exercises correctly, but deviates from the correct 
algorithm when it is asked to borrow from zero. When a column requires a BFZ, the bug simply 
takes the absolute difference in that column, avoiding borrowing of any kind, (Figure 6-4 shows a 
problem state sequence.) The obvious explanation is that the student perceives BFZ as some kind 
of difficulty and avoids it by taking the absolute difference instead of borrowing. This makes sense 
if the student has not yet been taught BFZ and knows only how to do simple borrows. Note that 
this intuitively appealing explanation uses a problem solving framework* ft postulates that the 
student detects problem situations and invents a way to avoid them. It falls outside the kinds of 
OvcrgcncralizaUon-based explanations that are currently being sought fur this bug migration class, 

4 4 10 4 10 4 10 

a. 5 0 7 b. 5 0 7 c. 7 d, JgS7 e, jB^JSr 7 f JBTJri 

- 28 -28 -28 -28 -28 -28 
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1 1 1 8 1 4 8 1 

Figure 6-4 

Problem state sequence for the bugSmallerFrom-Larger-Instead-of-Borrow-From-ZerOi 



120 



KOCAi. pROiJi i:m solving 



117 



To explain this bug ovcrgcncmli/atinii would be very dirficuU. One would have lo 
pobliilatc a wd> uf \ lowing the burrow subprocedurc as a whole since il is ihe whole subprtKedure 
tlijt iS replaced b> abbotutc difference. Prebuin,jbl>. sueh a vic\\pt>Jnl could be founded, but the 
ilici>rctica1 costs of pustulating such a large '^primitive*' would be high. Worse yei. this bug is in 
tlic s<imc bug migration class as the otiior bugs mentioned above. Ilie new "Lirge primitive* would 
st)mehow have to generalize them as welL 



The explanatory' adequacy of overgeneraUiaiion 

Hs^nual1>. these overgcnerah/Aiiion "accounts'* are jusi building the observed bugs into the 
set of primitives thai are assumed lo be present before learning begins. A wide range of primitive 
coni^epis ha^i been needed so fan jusi to capture three bugs. This would not be so bad if the 
priinnhCi> thdt generated bugs v^.crc somehow a natural class in lhai the class includes all concepts 
of a certain kind. But if all concepts that are *'similar'* to the ones needed so fer (whatever that 
mcani>) were allowed into the sel of primitives, then the theory would overgenerate wildly, lliis 
abandoni> an> chance of empineal adequao. Ilie opposite course is lo drop the constraint that the 
sot of primitives be somehow a natural class* and instead allow die dieorisi lo dictate which 
primitives are in the set/ llii^ would improve the empiricd adequacy, but it sacrifices explanatory 
adequacy. Iliat is, the iheor> answers the quesuon *'v^hy dues this bug exist?** by saying '*bccausc 
diib primitive concept exists/* but it has no answer for die follow-up question, *'Why docs diat 
pnmitive exist?** Sueh a theory doesn't explain the bugs> it only relabels them. It lacks explanatory 
adequacy. 

6.3 Impasse- repair independenee 

Thi? essential mechanisms of local problem solving are twofold* Problematic situations (called 
impasses) are detected then they are solved or avoided {called repairing the impasses). At once one 
IS struck by the apparent irrcfutabilhy of diis framework* If die dieorisi is allowed to postulate 
anything as an impasse and a repair then the theonst is allowed, in essence, to insert arbitrary 
condition-action rules into llie procedure, The condition is the impasse and the action is the repair. 
It is clear that any conceivable bug could be generated this way, by inserting the appropriate 
conditionaetion rule qua impasse- rep air combination* Sueh a framework would have no 
explanatory value. If one asked it why a certain bug existed, it would answer only **bccause a 
certain impassc-repair eombinaiion happens lo exist," 

The stipulation of an impasse and a repair would have some explanatory force if one could 
provide independent e^idcnt^c for die impasse and for die repair. That is, stipulating an impasse 
and a repair Rj to explain a eertatn bug would be believable if one could also exhibit a second bug 
generated by repairing the impasse I^ with another repair, This would be independent 
evidence for the stipulated impasse I^. Similarly, a good expUnation requires independent evidence 
for the repair, such as a bug diat results from using the same repair lo a different impasse, called it 
]y That is, to explain the original bug^ one needs to produce ti*e arrangement of evidence shown 
in diis cable: 

Il h 



^2 



Bug Bug 
Bug" 



The original bug to be explained is Bug. Bug' Justifies the stipulated repair and Dug" justifies 
die stipulated impasse. Actually, if the goal is to ascertain whedier die local^roblpm solving 



118 



LOCAL Problem Solving 



framcvtork is correct, then it seems that requiring the exi^itencc of d fourth bug. the combination of 
with necca^r>. The essence of problem sohing is that tifi) solution that wvrk^ is acceptable. 
If there are several possible means to an end, and if problem sohiiig is trul> the activity going on, 
then cai^h means amI! c\entuall> be applied by someone to achieve the goal (all other things being 
equal). In this case, tne goal (end) is to be in a non-impasse state. Hence, the ba^c notion of 
prublcm solving prcdjets that all repairs will be applied. b> someone at Sunie tnne. to each impasse. 
If a ceriair impasse- repair combination predicts a star bug. then the thct)r> should provide an 
expLinauon for Hh> thai repair was not a reasonable choice fur solving the problem presented by 
the impaSbc. If it could nou then one would begin to suspect that the framcv^ork wasn't really 
problem solving but something else instead. In shcrU the independence of impasses jnd repairs is a 
criiCiaL defining principle of local problem solving. The set of predicud bugs is exactly the set of 
all repairs applied to all impasses. Any exeeptiuns must be explained by the theory. Put 
difFcrentl). liu^^:^ - Impasses X Repairs, where X stands for the Cartesian product of two sets, 

A Cartesian produce bug pattern 

The bug data have many instances of the kind of Cartesian product pattern that local problem 
solving predicts. Tins will be illustrated with the three bugs mentioned carher, paired with three 
new bugs, Ilie rest of this section presents this pattern. It and others hkc u arc prime evidence for 
the local problem solvmg framework, lliis Cartesian product pattern has two impasses and three 
repairs: 





decrement zero 


laigcrfronj smaller 


Noop 


Slops* Borrow - At -Zero 


Blan k ■ 1 nsicad- of - Borrow 


Refbcus 


BoiTOw A cross -Zero 


S nalk r- Froit: ■ Laigcr 


Backup 


Smaller- From ■ Large r I nstead 


DocsnX- Borrow 




of*0oiTOw ■ From -Zero 





Bugs from the same impasse are in the s^me column. Repairs label the rows of the bugs they 
generate. The bugs will be discussed row by row. 

3 1 



Slops -Bono w A t -Zero: 


3 4 S 


3 4^6 


2^0^7 




- 1 0 2 


- 1 2 9 


- 1 8 9 




2 4 3 V 


2 16V 


4 8 X 


Blank- Instead-of-Bonow: 


3 4 S 
- 1 0 2 

2 4 3 V 


3 4 e 
- 1 2 9 

2 2 X 


2 0 7 
- 1 6 9 

1 X 



Correctly answered problems are marked with V. and ineorrcctly answered problems with X. The 
first bug. Stops Borrow At Zcra is generated by assuming that the student has not been taught how 
to borrow from zero. When the student tries to use simple borrowing on liFZ problems, such as 
the third problem, an attempt is made to decrement the zero. The student :-:^nizes that zero 
cannot be deeremented. An impasse occurs. The student has detected a local problem that needs 
to be solved before any more of the procedure ean be executed. The repair, called Noop 
(pronounced "no op"X simpb causes the student to skip the stuck decrement action (i.e.. it turns 
the action into a *'nu]l operation" or "no-op" in computer jaigon). This leads to Stops -Borrow- At- 
Zero shown above (see figure 6-3 for a problem state sequence illustrating its solution). 
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The second bug is ^ .nk Instcad-of Bt^rrow. Superficial!), it looks very dirferent from Stops- 
Borrov^ Al-/ero. It doesn't do an> borro\^ing. but instead leaves unanswered just those columns 
that require borrowing, llie explanation for this bug assumes tliat the student hasn't Ic.irned how 
to boriow yeL When the student attempts to takj a larger number from a Sn^atler one, an impasse 
occurs, presumably because the student knows tlut ">ou can*t take a big number from u small one/' 
I'hc repair to tliis impasse is tlic Noop repair. It causes the column difference action to be skipped, 
'lliis oxpbins wh> borrow columns have blank anSv^ers. In general the No^)p repair causes actions 
thai are "stuck" to be skipped. It is pertiaps the easiest of all possible repairs. Ii is a quite straight- 
fonvard solution to the problem of bei;)g unable to execute an action. 

0 

3 1 



Borrow Across-Zero: 


3 4 6 


3 4'6 


2'0'7 




- 1 0 2 


- 1 2 9 


- 6 9 




2 4 3 7 


2 16 7 


4 8 X 


Smaller- Fiom-Laiger; 


3 4 6 


3 4 6 


2 0 7 




-10 2 


- 1 2 9 


- 1 6 9 




2 4 3 7 


2 2 4 X 


1 6 2 X 



Borrow- Across- Zero is generated by applying the Refocus repair to the decrcmont'^ero 
impasse. The basic idea of the Refocus repair is to shift the external focus of attention^ in this case, 
where to perform the decrement operation. Refocus shifts focus in a way that maintains some 
faithfulness to the procedure's description. As before^ the assumptions are that the student knows 
that 7.CT0 canH be decremented bui docs not know how to borrow from ^ero. The procedure that 
the student is following presumably describes the place to decrement as the top digit in the column 
just left of the column currently being processed. Refocus relaxes that description somewhat, 
shifting focus to the top digit in some column left of the current column. Any column that will 
allow the decrement operation to Succeed is a potential candidate. In this case^ only the hundreds 
column qualifies, So it is chosen. (Figure 6-2 gives the problem state sequence of the bug^s 
solution.) 

Smaller From-Larger answers columns thai require borrowing with a number that is the 
absolute difference of the two numben. There are several ways to explain this bug. Here, the 
assumption is that the student reaches an impasse because he must process a column where the 
bottom digit is too large^ and he understands that one can't take a larger digit from a smaller one. 
The Refocus repair relaxes the description of the arguments to the column difference operation. It 
relaxes the constraint on relative vertical positions. Ilie operation is performed as if the column 
were inverted. This allows it to answer the column, thus coping with the impasse. 



3 2 



Smatle^FIom■ LaIge^Insteador- 


3 4 6 


3 4'6 


3'0 6 


Bonow-rrom- Zero: 


- 1 0 2 


- 1 2 9 


- 1 6 7 




2 4 3 7 


2 16 7 


1 4 2 X 


Doesn't-SwTOw: 


3 4 6 


3 4 6 


2 0 7 




- 1 0 2 


- 1 2 9 


- 1 6 9 




2 4 3 7 


X 


X 



These two bugs illustrate the Backup repair. Backup is perhaps one of the most difficult 
repairs to present, although it underiyingly quite pimple. The essence of the Backup repair is 
retreating in order to take another alternative path. Backup resets the execution state of the 
interpreter back to a previous decision point in such a way that when interpretauon continues, it 
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nmII chix>sc a different dUernalive than the one that led to tlie impasse thai Backup repaired, fn 
jnost caMJs, using Backup causes a secondary impasse. This !S just what happens with Smaller 
From l^rgerlnstead-of-BorrowFrom-Zcro. As with the other bugs, the student reaches an impasse 
trying to decrement the zero in the tens column. Die Backup repair geti past the dccrement*zero 
impasse b> "backing up," in the problem solving sense, to the last dcCiSion which has some 
alternatives open. After the repair, the student tries to process the units column in the ordinary 
wa>. Immediatel) he hits a second impasse, since he knows that one can't take a larger number 
from a smaller one. Iliis second impasse is repaired b> Refocus, yielding absolute difference as the 
answer in the units column. The student finishes up the rest of the problem without difTicukyi 
The dcnwUion of this bug is a little complicated. One should perhaps just try to get a rough sense 
of it now. Later, it will be presented in detail. 

The bug Doesn't* Borrow is simple. Whenever it encounters a eolumn thai requires 
borrowing, it gives up on doing the rest of the problem, and goes on to the next problem on the 
tesL if there is one^ The bug is generated by applying the Backup repair to the impasse of being 
unable to take a column difference. At this point in the procedure, the most recent decision is not 
the dectsinn .*bout borrowing, because the student doesn't know about borrowing >eL Instead, the 
most recent decision invohes whether to do the problem at alK The Backup repair retreats to this 
decision, and takes the open alternative: the student gives up on this problem, and goes on to the 
next 

The repair- impasse independence principle makes predictions 

A crucial fact about the repair process comes out clearly in the Cartesian product pattern. It 
is the independence of repairs and impassesi Every repair is applicable to every impasse, fn 
orinciple, a bug will be found for each pairing of an applicable repair with an impasse. 

Of course> some repairs are much more popular than others, and some impasses are more 
common than others. Combining an unpopular repair with an uncommon core procedure may 
predict a bug that has not yet been observed, fn fact, several bugs have been predicted by repair^ 
impasse independence, then observed later When the original model for repair theory was first 
tested, in September 1979, it predieted 16 bugs that had not yet been observed. When its 
predictions were test against newly collected data in December 1979* 6 of the predicted bugs were 
discovered (Brown & VanLehn, 1980)* Since then, another of the original model's predieted bugs 
has been discovered even though few new data have been acquired in the interim. So, one of the 
chief advantages of the tmpassc-repair independence principle is that it makes predictions that can 
be used to focus empirical investigations and to test the theory. 



It ib not the case thal*repairs and impasses are statistically independent Although rare bugs 
^ result from Ubing uncommon repairs to uncommon impasses> it is not always the case that 
combining a common repair and a common impasse results in a common bug. The frequencies of 
the six bugs discussed above show this effect: 



Repair-impasse independence vs,^ bug occurrence statistics 
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'Ilicsc figures shu^v the number of students in a sample of 1147 >*ho had the specified bug (see 
appendix 3). in this sample, the two impasses are equally common, 110 students had the 
decrement zero ImpdSsc. and 116 students had the larger- from-smaller impasse. Ho>*ever, there is a 
strong skew In repair preferences, llie Noop and Uefocus repairs were equally popular for the 
decrement-zero impasse* but the Refocus repair was strongl> preferred by students who can't 
borruv*. This shows that a sunple assumption of statistical independence is quite unwarranted. 
f^cpair-impaSSe independence does not mean statistical independence. 

However* there are several problems with bug frequency data, ff these can be solved, 
sutistical independence ma> be found. ITie mam problem is that most bugs are rather uncommon, 
occurring less than a half doicn times even in large samples (see appendix 3). This makes statistieal 
mfrrence!. unreliable. A more subtle difRculty is that man> bugs have multiple causes. Multiple 
deruaijons make bugs more common than simple frequency models would predict For instanee, 
Smallcr-From l-argcr is common because it has at least two derivations — one as the application of 
the Refocus repair and another as overgcneralization. The overgeneralization account is simply that 
the learner chooses absolute difference as the generalization of examples such as 

5 

On this account, students believe that 5-2=2-5 = 3 despite the fact that they have never seen 
examples of the latter case* 2-5 = 3. Given this concept of column diffcrence, the students solve 
borrow columnSi e.g., the units column of 

42 
-15 
33 

v^ithout reaching an impasse. Thusi Smaller- From -Larger has a derivation as induction from an 
impoverished set of examples. Accounting for bug frequencies would have to take such multiple 
derivations into account 

Summary 

The main purposes of repair- impasse indepen.!encc are (1) to capture an important trend in 
the daU, the Cartesian product pattern* and (2) to give a rigorous expression of the basic notion of 
local problem Solving as multiple means to the same end* and (3) to rescue the theory from the 
irrefut:tbilit> of allov^ing the theorist to postulate arbitrary^ non-independent impasse- repair pairs. 
To the extent that the pattern holds across the data* the local problem solving framework is 
vindicated. The local problem solving explanation loses its force if independence has too many 
exceptions. To put it difR^rentlyi the principle sets the default to independence. Any time a 
particular repair*impassc combination leads to a star bug* the theory must explain v^^hy. 
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6.4 DjnJmic static local problem solving 

In the AI literature, the basic idea of detecting problems in a procedure and fixing them is 
not new, Sussman's haCKIR program had two kindb of problem detection and rectification systems 
{Sussman, 1976). One .;ctcd d>namjcal)>, that is. during the cxeciitinn of the procedure. It would 
detect problems such as trying to place a ph>sical object in a space occupied bi another object. 
The second s>stem act^^d statical!): it would examine tJic procedure as a goal-bubgoal hierarchy, 
looking for pattcms of conflicung goals, it cuuld thus detect some problems without ever running 
the prt>cedure, llie same choice exists for local problem sohmg m th^b domain: impasses can be 
detected and repaired dynamicall> or statically. To put it intuitively, the issue is yihen the local 
problem striving process is carried out by the student, it could be iliat the local problem solving 
p,"occss is something like forgetting or mislearning. It could happen while the student is steeping, 
or v^atching the teacher, or explaining the procedure to a friend. All that one can sec in the 
Cartesian product pattern \b the result of rcpairt and not whcfi it happened, Iliis section is a 
competitive argument between two approaches. The two hypotheses are (LPS will be use to 
abbreviate **local problem solver"): 

1, .»namic LPS: impasses are detected during the execution of the procedure. Repairs arc 
made to the current state; theprocedureitself is not modified, 

Z Static LPS: impasses are detected by analyzing the structure of the procedure without 
executing it. Repairs are made by changing the procedure's structure. 

Really, there is hardly any controversy (that is why the argument has not been given a chapter of iis 
own), ITic evidence is clearly on the side of dynamic LPS, 

Bug migration and stable long-ienn bugs 

Intuitively, bug migration is a strong argument for the dynamic local problem solving 
h'pothesis. But as it turns out, the static LPS hypothesis can do just as well at predicting bug 
migration although it must be accompanied by a simple (and :id hoc) ancillary assumption. 

As discussed earlier in this chapter, bug migration is the phenomenon of a student switching 
aTr*'>ng two or more bugs during a shon period of time with no intervening instruction, ITle bugs 
ti*e student switches among are called a bug migration class. The theory aims to predict which seis 
of bugs will occur as bug migration classes. With regard to local problem solving^ the basic idea is 
that the bugs in a bug migration class result from applying different repairs to (be same impasse. 
lli^t is, the student appears to have the same procedure throughout the period of ,bservation, but 
chooses to repair its impasses differently at different times. This basic idea is independent of 
whether local problem solving takeb places statically or dynamically. Figure 6-5 presents an example 
of bug migration among several of the bugs discusitcd earlier. It illustrates how several bugs can 
occur on the same test by application of different repairs to the same impasse. 
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Figure 6-5 

Sdution 10 a test by studcnt22 ofclassroom 34 
showing intra-test bug migration. 
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Figure 6-5 is an exact reproduelion of a test taken by student 22 of class 34. She misses only 
SIX prublems. namet> the ones that require borrov>ing from zero. The first t^^o problems she misses 
(the second and third problems un the fourth row) arc answered as if she had Uie bug Slops- 
liorrow^At-Zero. I^hal is* she gels stuck when she attempts to decrement a zero, ajid us<^ the Noop 
repair in order to skip the decrement operation. "ITie next two problems she mi£.ses (the first two 
problems on the last row) are answered as if she had the bug IJotrow^Across^Zcro. She hits the 
same impaSsc* but repairs b> rclocaung the dcercment leftward using the Refocus repair. On the 
third problem of the last row, she uses two repairs within the same problem. For the borrow 
ongjnatmg in the tens column, she uses backup to retreat from die dccrementvtro impasse. She 
Winds up writing a zero as the answer in the tens column (as if she had the buj; Zero lnstead-of- 
Borro*v-Krom-Zero). In the hundreds column, she takes the same Refocus repair that she used on 
the preceding two problems. On the last problem, she uses the Noop repair for both borrows* 

Patches present a problem for the dynamic LPS hypothesis 

The student of figure 6^5 is typical in that her repairs occur in runs. The fiist two repairs are 
one kind, the next two are another, and so on. This observaiion suggests that there can be a 
temporar> assoctauon of an impasse with a repair. These pairs are called patches. Apparently, the 
first time the student of figure 6-5 hit the impasse, she searched for an applicable repair and not 
on1> used it but created a patch to remember that she used it. On the next problem, she again 
encounters the impasse* but instead of searching for a new repair she just retrieves the patch, and 
uses its repair. She completes the next problem without encountering the impasse, which is 
apparently enough to cause her to forget her patch, since the next time she hits the impasse, she 
repairs it a new wa>. Either the patch was forgotten during the non-impasse problem* or she chose 
to ignore it and try a different repair. The latter pvissibility is supported by her behavior at the end 
of the tcbt. where she is applying different repairs for each impasse even when the impasses occur in 
the same problem. In short, there seems to be some flexibility in whether patches are ignored, ar*d 
perhaps also in how long they are retained 

Intcrtest bi^g migration exhibits a more extensive use of patches. Inter-test bug migration is 
detected b> testing students twice a short time apart (say, two days) with no intenening instnietion> 
The student has a consistent bug on each test, but not the same bug* The bugs are related in that 
the> can be generated by different repairs to the same impasse, li appears that the student has 
retained the procedure between the two tests, but the patch that ^as used on the first test was not 
retained. Instead, a new repair was selected, stored ta a patch, and used consistently throughout the 
second test 

Bugs do not always migrate. Some bugs are held for months or years. Apparently, patches 
can be stored for long periods of time. 

Bug migration was predicted in advance of its observation (Brjwn & VanLelm, 1980). It f<ills 
out ab a natural consequence of viewing the repair process as modifying the execution (short term) 
sutc of the processor that interpri-ts the stored procedure. The dynamic LPS hypothesis naturally 
predicts bug migration. What it has trouble with is explaining the repetition of die sariie repair to 
the same impasse^ a phenomena referred to above as creating^ storing and reusing; a patch. The 
dynamic LPS hypothesis could explain this as a chance selection of the same repair over and over 
again. However, it is much more plausible to add an ancillary hypothesis that some kind of patch 
creation and storage exists. The patch hypothesis is difficult to verify since Uierc is no way to tell 
whether or not a student has a patch (they could just have choben the same repair twice), llie only 
argument for their existence is intuitive plausibility. Nonetheless, entountenng seventh graders with 
bugs that are acquired m the third grade J5, for me, a fairiy compelling demonstration that patches 
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exist, even if ihcy aren't a proper part of tlic model. 

The sialic LPS hypothesis' account for bug migration 

So far, bug migratiun has been discussed in terms of the dynamic LPS hypothesis. However, 
the static LPS hypothesis can generate the same predictions. The fonc of static LPS is making 
permanent changes in the pniccdiire. It examines the procedtires's structure in order to find (or 
predict) impasy^b and install patches. This is all done without exauting the procedure* The 
instalLitiun of a patch into the priKcdure naturally predicts stable, long-term bugs. Bug migration is 
more problematic* To predict bug migration, one must assume that it is possible to have stochastic 
patches: a random variable governs^ which action the patch will take* The various bugs in a bug 
migration class result from the patch switching at random among various sub patches built by the 
local problem solver. This is somewhat implausible, perhapc. 

However, the bare fact is that bug migration and long-term bugs both exists, and that 
dynamic and static both predict one naturally, but require a supplementary hypothesis to account 
for the other. ITie dynamic LPS hypothesis requires patch abstraction and storage; the static LPS 
hypothesis requires siochasijc patches. So there is really no decisive argument here. We must look 
a little deepen 

Impasses and repairs need dynamic information 

The prccec'ing argument tried to relate the model's chronology, the sequence of derivational 
events, :o real titrie. Such performance arguments are often quite slippery. Memory can always be 
uscQ to shuttle hypothetical cognitive events for^vards in ttme in order to satisfy the exigencies of 
the observations. Indeed, the ai^ument above ended inconclusively. Laying time aside, the main 
difference between static LPS and dynamic LPS hypotheses is the kind of information available to 
the local problem solver. A dynamic local problem solver has the current state {i.e^ active goals, a 
partially worked exercise). The static local problem solver has the procedure's calling structure 
(goal-subgoal hierarchy). The static local problem solver can perhaps examine all the failure modes 
of a primitive operator, such as decrement, and decide what to do for each one. However, there are 
intricate ways that a procedure can fail during execution. For the static local problem solver to find 
them, it would have to simulate running the procedure, and hence it would become, in effect, a 
dynamic local problem solver. 

As an example, consider the bug Borrow-Across-Zero. Under the static LPS hypothesis, this 
bug is generated by assuming that tlie student has never learned how to borrow from zero, and that 
the static LPS has built into the procedure a patch so that v/hen decrement fails by trying to 
decrement a zero, the focus of attention is shifted left to a nearby, non-zero digit, which is 
decremepled instead of the iero. Figure 6-6a shows Borrow Across Zero solving a problem. Notice 
that when tt borrows in order tu answer the tens column, it must decrement the hundreds column a 
saond lime (problem stale e). TOs creates a rather unusual combination of scratch marks. The 
very first time this student could have seen such scratch marks is the first time the student solved a 
BF/ pn^blem. Until the stud'^nt actually tackles the first BFZ problem* static local problem solving 
would have no reason to suspect such a strange double-decrement situation might anse. 
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Figure 6-6 

(A) Problem stale sequence for Borrow- Across -Zero. 
(B) Problem state Sequence for the bug set (Borrow-Across- Zero !Touched-Zero-As-Ten). 



Some students have a v^iriant of the bug which indicates that, for them, double decrementing 
is an Unusual enough action that it warrants repair. An attempt to decrement an aheady 
decremented digit causes an impasse. Figure 6-6b shows the bug set (Borrow-Across- Zero 
!Touched-Zero-ls-Ten), Just after problem state d, it attempts to decrement the hundreds column a 
Second lime. However the student did not do the decrement, taking an impasse instead. The 
Noop repair was applied, causing the decrement to be skipped. The student then did the second 
part of borrowing, the addition of ten to the tens column (state e). and finished the problem 
correctly. This shows that double-decrementing can cause local problem solving, ff the static LPS 
hypothesis is to account for this, it must assume that the LPS is very smart The LPS has to plan 
ahead to realize that decrements might stack up In some unusual situations, and prepare a repair for 
t])is case. This is entirely implausible, 

Tlie preceding example showed that impass detection required dynamic (runtime) information. 
There is a similar argument that shows that repairs also need d>namic information. The argument 
involves the Backup repair. It can be shown (see section J of appendix 9) that Backup is most 
simply formulated as a modification of the execution state rather than a modification to the 
structure of the procedure. The argument rests on the fact that in certain situations where Backup 
has been obsened, there are two instantiations of a certain goal and Backup only goes to one of 
them. Static Backup can't discriminate among several dynamic instantiations of the same goal, but 
dynamic Backup can. Although a complex patch could be constructed by static local problem 
solving, it would essentially have to do exactly what a dynamic Backup would do anyway. So a 
dynamic version of Backup is the simplest, most natural way to handle these special cases of local 
problem solving. 



Summary 

The information that is available Statically is jdSt not sufficient to explain the kinds of local 
problem solving that occur, fjocal problem solving makes essential use of information that is 
naturally available at runtime. To generate the information statically would require such powerftjl 
simulation capabilities of the static local problem solver it would be come essentially equivalent to a 
dynamic local problem solver So the static LPS approach is just not workable. 
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6.5 Formal hypotheses 

The main ideas of loca! problem solving are impasses, repairs, their independence, and their 
embedding in the dynamic, runtime environment. To express these formally, a description of how 
procedures are c^xuted is needed. That is, the gross architecture of a prtxedurai interpreter will be 
used to formalize the theory, Qiapter 3 began the formalization by postulating an undefined 
function, Cycle, that maps a runtime state into the next runtime biate. It ib u^ed to describe the 
observable actions of the student when the student applies a procedure to a runtime sute. Chaining 
applications of Cycle generates the procedure's solution to an exercise problem. In order to 
fbrmalize local problem solving, Cycle will be defined in terms of several new, undefined 
funetions. The nomenclature that will be used is: 

P A variable designating the student's procedure, 

S A variable designating the current runtimestate. 

(Internal S) A function that returns the intemaU^^xecutton, or interpreter) runtime state 

(External S) A function that returns the external (problem) runtime state of S, The 

runtime state is a composite of the internal and external state. 

(Cycle PS) A function that inputs a procedure and a runtime state and outputs a set 

of next states. It represents one cycle in the interpretation/execution of 
the procedure, 

(Interpret P S) An undefined function that expresses what the procedure does without 

local problem solving. It represents the "normal" interpretation of the 
procedure. It inputs a procedure and a runtime state and outputs the 
next runtime state. It represents one cycle of the interpreter. It will be 
defined later by the procedural representation language, 

(Repair S) An undefined function that lakes a runtime state and returns a set of 

runtime states eorrcsponding to various repairs, 

(Impasse S) An undefined predieate on states. It is true of a runtime state if the 

combination of execution and problem state constitutes an impasse. 

The basic tcdinique used to formalize local problem solving is the same as the one used to 
formalize learning. In this case, two undefined functions are used: Repair and Impasse, The 
predictions of the theory will be made in lenns of them. Various constraints (hypotl-cses) will be 
placed upon them. The actual functions used in Sierra are just one way of instantiating the two 
functions in obedience to the constraints. In particular. Repair is formalized by a set of five 
repairs; Noop, Backup, Refbcus, Force and Quit, /The formalization of Impasse is similar* A set 
of impasse conditions is defined. For instance, precondition violations are one kind of impasse 
condition. Impasse conditions are to Impasse as repairs are to Repair, 

Given the nomenclature, the basic ardiitccture of the combined interpreter and local problem 
solver is defined by the following hypothesis: 
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Local problem solving 
Ut 

(Cycle P S) = 

if -{Impasse S) then {( Interpret P S)} 

else {(Interpret P S') | s'G (Repair S) and -( Impasse S') 

This h>pothcsis defines the Cycle function in terms of the three undefined functions. Repair, 
Impa?ise> and Intarpret* Interpret is the normal interpretation of the procedure. If the 
current state is not an impasse, then Interpret is what Cycle does. If there is an impasse, then 
a repair is inserted before the Interpret* Because more than one repair is possible, tliere ma> be 
more than one successor state. Hence, Cycle returns a set of states, 
• 

There are several tacit features that are built into the definition of Cycle, Although it is 
redundant, it is useful to break these out as separate h>potheses* This makes it easier to refer to the 
concepts later. 

Dynamic LPS 

The local problem solver reads and changes the dynamic (execution time) state* but it does not 
change the procedure's structure* 

Repairimpasse independence 

Any repair can be applied to any impasse* 

FiUet' trigger symmetry 

An impasse condition triggers local problem solving if and onlj ifit also acts as a filter on repairs* 

The first two hypotheses have already been discussed. The last one refiects the idea that repairs 
actually fix impasses. That is. Impasse must be true after the Repair function is done* It turns 
out that some repairs change the state in such a way that the new <;tate is an impasse', perhaps of a 
different kind than before. That is, the repair doesn't really fix the problem; the interpreter is still 
stuck. Such repairs are filtered. To put it differently, if a certain impasse condition is sufficient to 
cause repairs (trigger local problem solving)* then it is also effective in filtering repairs* All this 
follows from the basic notion that local problem solving really is a form of problem solving. 
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Chapter 7 
Deletion 



Two sources of bugs have been identified so far One is overgeneralizaiion, or raiher correct 
>duction from impoverished sets of examples (se*^ section 6A), The other source uses local 
problem suKing to repair impasses, which arc caused ultimatelj b^ incomplete learning (see section 
6.3). In a sense, these two explanations fall under the broad headings of learning and invention. 
Iliis chapter shows that a third source of bugs exists* something akin to forgetting. It perturbs the 
sinicture of a learned (core) procedure. For historical reasons, the new source of bugs is called 
deletion. This chapter discusses the reasons wh> tlie thcor> needs deletion. It presents a certain 
group of bugs and discusses three explanations for them; 

1. The bugs result from local problem solving applied to procedures generated 
by partially completed learning. 

1. The bugs result from of overgeneralizatiorL 

3. The bugs result from deletion of part of a learned procedure. 

ft is shown that neither of the first two explanations for the bugs work. This justifies introducing a 
new formal mechanism, deletion, into the theory. 
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11 1 Local problem solving will not generate certain bug$ 

Many bugs can be accounted for by Incomplete learning followed by local problem solving, 
1lie basic idea is thai the student is tested on skills that either have not been taught yet, or haven^t 
been mastered. This often leads to impasses and repairs and, in turn, to bugs. However, this 
account will not work for certain bugs. For handy reference, these bugs will be called the deletion 
bugs. Explaining them is the target of *this chapter. 

The deletion bugs are best understood in contrast to bugs generated by local problem solving. 
The following bug can be generated by incomplete learning and repair: 

3 2 

stops-Borrow-Ai^Zero: 3 4 6 3 4^6 3^0^7 3 0^7 

-10 2 "> 1 2 9 - 1 <i 9 z 8 

243/ 216/ 148X 308X 

The procedure behind this bug does not know how to borrow across zeros, h borrows correctly 
from non zero digits, as shown in the second problem. On the third problem, it attempts to 
decrement the zero, hits an impasse, and repairs by skipping the decrement operation entirely (the 
Nqop rcp,iir) Th** point is that this bug has a complete, flawless knowledge of borrowing from 
non*zero digits, but it docsn^t know anything about borrowing from zero- Precisely at one of the 
lesson boundaries in the subtraction curriculum, its understanding stqps< Now compare this 
knowledge state with the one implicated by the following bug: 
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3 2 1 

DonVDccrcmem-Zero: 3 4 6 3 4^6 3^0^7 3^0^7 

- 1 0 2 - 1 2 9 ' -16 9 z 9 

243V 216V 148X 210 8X 

This bug also misses just BFZ problems {"BFZ'* abbreviates "borrow from zero**). Indeed, it gets 
the same -nswer on the third problem as the previous bug* Stops- Borrow A t-Zero. Howeveri it 
Solves BFZ problems in a ver> different way. Notice the fourth problem, 'file bornjw in the units 
column caused some* but not all of the BFZ subprocedure to be executed. The following problem 
state sequence shows the initial problem solving: 

2 2 2 2 

a. 3 0 7 b. 3^0 7 c. 3^0^7 d. 3^0^7 

- § - 9 - 9 - 9 



e 

Most of the BFZ subprocedure is there. What is missing is its last action, decrementmg the ten in 
the ten*s column to nine, which should occur between slates b and c. Because the bug dues some of 
the BFZ subprocedure* it is^likel> that subjects with this bug have been taught borrowing across 
zero. But it is also clear that they did not acquire all of the subprocedure, or else forgot part of it. 
If the subtraction curriculum was constructed so that teachers first taught une half of borrowing 
across zero and some weeks later taught the other half, then one would be tempted to account for 
this bug with incomplete learning. But BFZ iSi in fact* always taught as a whole. So som^ other 
formal technique is implicated in th.s bug*s generation. DonVDecrement-Zcro is one of the 
deletion bugs. Several others are detailed in chapter 10. 

The case has been made that incomplete traversal of the curriculum will not generate a 
procedure that is appropriate for explaining this bug. Another way to make the same point is to 
note thai repair could, in principle, generate the bug b> using a Noop or Backup repair that would 
cause the tensxolumn decrement to be skipped. . However, in order to have a repair, one must have 
an impasse. In this case> the impasse needs to be just before the decrement (i.e., between states h 
and c above). However, there is no apparent reason for an impasse there. The decrement is merely 
subU'acting one from ten — an easy, unproblematic operation. No impasse condition that f know 
of will cause an impasse there. Without an imoasse> there is no way to use Noop or other repairs 
to generate the bug* Again, the conclusion is that some other mechanism must be utilUed to derive 
this bug. 



7.2 Overgencrali^ation should not generate the deletion bugs 

Overgeneralization can generate Doii*t-Decrement-Zero> but only at the cofet of losing 
explanatory adequacy. The trick to an overgeneralization -based derivation is to induce tliat the 
decrement action in question is optional. One assumes that the student has received examples 
teaching BFZ. The examples wilU of course* have the decrement action. For somt reason, the 
student induces that this decrement is optional* even though all the examples happened to have it 
On a test, the student instantiates the generalized subprocedure by choosing not to make the 
optional decrement. This generates the bug. The other deletion bugs can be generated with similar 
optionality-based inductions. Optionality is a plausible primitive concept for p procedural 
representation language to have, so there is nothing wrong a priori with this |explanation. 

The problem is with the nature of optionality. Inducing an optional fragment of a 
subprocedure is inducing a disjunction. The pnxedure acquires a choice about whether or not to 
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execute the action. ITius, inducing a disjunction inside the subprocedure violates llie one-disjunCf 
per lesson h>pothesis. To put it dilTerently* it is a felicity condition that the teacher will show the 
student when a disjunction is needed- But all the examples used a d*^crcmcnt, none omitted it* 
1lie student has no evidence that a disjunction is needed. It is a direct violation of the Telieity 
condition to put one tn. To admit this violation just to generate a few bugs wrecks dn otlierwise 
explanatory framework. 

73 *nie problems of defining a deletion operator 

It has been shown that the two bug-generating pathways that the theory currently provides are 
inerFccti\e in generating the deletion bugs. This motivates including a nev^ operator in the theory. 
On the basis of the bug Don*t-Dccremcnt'Zcro, it seems that some kmd of deletion operator will do 
the job* something that removes an action such as decrement from a sequence of actions in a 
subprocedure. 

It is not easy to formali/jc deletion in an empirically adequate wa>. Richard Young and Tim 
0*Shea used a model based on deleting production rules to generate some of the most common 
bugs (Yourife 3c 0*Shea. 198I), including most of the deletion bugs- However, their approach could 
also generate star bugs. Given their production system, which has 22 rules* there are 22 possible 
nle deletions. However, only 7 of the possible 22 rule deletions generate bugs. Deleting certain of 
the other rules generates star bugs. In general, totally free* unconstrained deletion overgenerates 
wildly. One fix is to allow the theorist to specify which rules may or may not be deleted. This just 
transforms the question of why do only certain bugs exist, into the question of ^hy do certain niles 
get deleted and not others. It doesn't explain very much. So the real problem with deletion is to 
put just the right constraints on it so that no star bugs are predicted and yet the deletion bugs are 
generated. Chapter 10 gives this tricky issue a full discussion. 

For now* deletion will be formalized using an undefined function. Delete, which mutates 
procedures. It is interposed between the output of Learn and the input to Cycle. To capture 
this formally, the following hypothesis is used; 

Deletion 

If P is a core procedure, then all P' € (Delete P) are core procedures as well. 

The function Delete is set-valued to capture die fact that there is often more than one possible 
deletion that can be made to a procedure. As with the other main undefined functions in the 
theory. Delete will be defined by acquiring more constraints upon its behavior. 
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Chapter 8 
Sunnnnary: Architecture Level 



The preceding fl\e chapters laid out ihe general architecture of the model and defended the 
main pnnciples of the theory. This chapter summarizes both, makes a few comments, and 
mtroduees some of tne issues discussed in following chapters. 

8.1 The architecture of the model 

The expositionat strategy of this document is to start with an architecture composed of 
undefmcd functions, then to add constraints that gradually define the functions. At the top level, 
the architecture consists ofthrec undefined functions. Learn, Delete and Cycle. Learn takes a 
lessan and a procedure as inputs; it returns a set of procedures. Learn reprcscnis the various ways 
that its input procedure can be augmented in order to assimilate the lesson. Delete takes a 
procedure as Input; It returns a set of procedures as outpu . where each procedure is the resuh of 
deleting some part of the input procedure. In a later chapter, it will be shown tha; it simply deletes 
a rule from the And-Or graph that represents the procedure. The need of a deletion operator that 
is distinct from learning is argued for in chapter 7. The third undefined function, C^cle, 
represents one cycle in the interpretation/execution of a given procedure. Its inputs are a procedure 
and a ''nintime state.** A runtime state is a composite whose parts are an external state (le., a 
problem state) and an internal state (Le., the interpreter's state). A runtime state represents the kind 
of information that can change while the procedure is running. The access function (External 
S) returns the external state of a given runtime state, S. Similarly, (Internal S) returns the 
interpreter's state of \hL S. The function Cycle computes the '"next*" runtime state. It takes a 
procedure and a runtime stave, and it returns a set of runtime states. It returns a set because 
interpretation of the prjc^dure is sometimes non'deterministic. Several states are possible *next" 
states. Given these functions, the top level of the model is defined by the following hypotheses: 

Incremental Learning 

Given a lesson sequence iLy...L^and an initial procedure P^: 
Procedure is a core procedure if 



{2)/*j€ (Learn Z.^^^^) and is core procedure. 
Deletion 

If P is a core procedure, then allP' € (Delete P) are core procedurcsas well. 
Predictions 

If S^ is the initial state such that {External Sq) is a test exercise, then the set of predicted 
problem state sequence for students with core procedure P is exactly the set 
{<(External S(j)..>(External S„)> | Vi S^ € (Cycle P Sj.^)}, 

These hypotheses say that the basic architecture has two simple cycles. One cycle acquires a 
procedure, and the other cycle executes a procedure to solve a problem. The aequisitional cycle is 
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called tile learner in Sierra, ft nins once for each lesson: First it executes Learn on a lesson, 
producing a set of procedures, then it executes Delete, which augments the set. The resulting set 
of core procedures is fed back as input procedures for the next cycle of the learner. The other cycle 
IS called the soi^^er in Sierra. It executes a core procedure to solve some problems. It includes both 
normal execution and local problem solving. Roughly speaking, the solver-cycle happens once for 
each action of the procedure, ITie grain of the cycle cannot be stated more precisely until the 
procedure representation language becomes more defined. The relationship between the learner 
and the solver are discussed ftjrther in section 2.1. 

Observable predictions consist of problem state sequences. Each predicted student Dehavior is 
a set of problem state sequences, one sequence per test problem. (In particular, the solver generates 
intra-test bug migrations as well as stable bugs.) These whole-test sequences could be compared 
directly to student behavior. However, given the numbers of students, core procedures, and repairs 
involved, such a direct comparison would be an awesome task. .\ much simpler lest of the theory is 
used It is described in section 2.1. 

8.2 Hypotheses and their support 

ITie preceding formalisms serve basically as a framework on which mo/e substantive 
hypotheses are hung. As mentioned, the expository tactic is to begin with undefined functions and 
slowly define them. Most of the preceding chapters was concerned with defining Learn and 
Cycle. This section presents each of the remaining ^'substantive" hypotheses. 

Induciion 

ff Pj€ (Learn P^^) then for each exampleproblem jc in (Examples L^). the problem 
state sequence that is P^s solution to x is equal to the problem state sequence that is the 
Solution to X used in the example. 

This hypothesis says that mathematical skill acquisition is inductive in character. The 
arguments supporting it are in cl:apter 3. Inductive learning is the only learning framework of 
those discussed that is consistent with the gross features of classroom learning. The induction 
hypothesis is framed careftjlly to work with the deletion hypothesis, which was stated above. 
Although both Learn and Delete produce core procedures, the core procedures that Delete 
produces do not generalize the lessons* examples. In a rough sense^ the sequence of actions 
produced by a deleted procedure is the same as the sequence of actions from the undeleted 
procedure except that a few of the actions are removed. Del ete-produced core procedures are not 
consistent with the lessons, but Learn-produced ones are. 

Show- work 

in worked examples of a lesson, all objects mentioned by the new subprocedure are visible, 
unless the lesson is marked as an optimization lesson. 

This hypothesis expresses a key felicity condition. Students act as if they believe that the 
teacher will always "show all the work" while doing example exercises. For the portion of the 
procedure that introduces the new subprocedure, the teacher is expected to write down intermediate 
results that are normall> held mentally. For instance, the ftr^t lesson on carrying has examples like 
this one: 
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The units column sum, 12. appears explicitly rather than being h'.^d mcntall) as it will be when 
carrying is eventually mastered. Such mastery is taught in separate lessons that shovv how to avoid 
some of the writing by holding intenrediate results mentally. These lesson are spcciall) marked. 
The theory's term for them is optimization lessons. Sierra doesn^t handle optimization lessons, 
mostly because optimization lessons are not used in teaching subtraction and the other skills that 
data are available on. The show-work hypothesis is one solution to the invisible objects problem of 
inductive learning (see chapter 5). Other solutions have the same empirical adequacy as show-work, 
but they have less explanatory jdequacy. They do not explain why teachers almost always show 
their work, nor do they explain why there are two kinds of lessons 

In order to formalize the next felicity eondition, it is convenient to use three new undefined 
functions. The formerly undefined function team will be defined in terms of them. The three 
new functions are listed below, with informal explanations. Two simple helping functions are 
defined as well. 

(Induce P XS) represents disjunction ^free induction. The first argument, P, is a procedure. 
The second, XS, is a set of worked example exercises. The function retums a 
set of procedures. Each procedure is a generalization of P that will solve all 
the exercises the same way that they are solved in the examples. Induce is 
not permitted to introduce disjunctions. If the procedure cannot be 
generalized to cover the examples, perhaps because a di^unction is needed, 
then Induce retums the null set. 

(Disjoin P XS) represents the introduction of a disjunction (e.g., conditional branth) into P, 
the procedure that is its first argument. The second argument, XS, is a sot of 
examples. Disjoin retums a set of procedures. Each procedure has had 
One disjunction introduced into it. The disjunction is chosen in such a way 
that Induce can generalize tlie procedure to cover all the examples in XS. If 
there is no way to introduce a single disjunct that will allow all the examples 
to be covered, then Disjoin retums the null set. 

(Practice P XS) represents another kind of disjunction- free generalization, one driven by 
solving a set of practice exercises, XS, Practice retums a set of procedures, 
each one a generalization of its input procedure P. 

(Examples L) This access ftjnction retums the sequence of worked examples of the lesson L. 

(Exercises t) This access function retums the sequence of practice exercises of the lesson L. 

Given these functions, the renaining felicity condition can be simply stated: 
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One-disjunch per lesson 
Let 

(Learn P L) = 

If (Induce P (Examples then (Uarnl P L) 

el se {Leari^ PL)- 

where 

(Learnl P L) = 
{ P" 1 3 P' such that P' £ (Induce P (Examples L)) 
and P" e (Practice P' (Exercises L)) 

and 

(Lea 2 P L) = 

{ P I 3 P' such that P' £ (Disjoin P {Examples L)) 
and P" e (Learnl P' L) }. 

Moreover, { Induce P XS) and { Practi ce P XS) do not introduce into P any new 
disjunctions or any new disjunc[s on old disjunctions, and (Disjoin P XS) insertsintoP 
exactly one new disjunction or one new disjunet on an old disjunetion. 

The first pan of the hypothesis says, essentially, that Learn performs the functions Disjoin, 
Induee and Practice, in that order. However. Disjoin is skipped if it is unnecessary for the 
particular lesson. The last two clauses of the hypothesis express a key idea: Students learn at most 
one subprucedurc (disjunct) per lesson. Put differently, the students act as if they believe that the 
teacher has designed the lesson sequence in such a way that introduction of a new disjunct 
(subprocedurcs) always falls on a lesson boundary. Some subprocedures (disjuncts) may take 
several lessons to learn, but no lesson introduces more than one, ITie arguments for the hypothesis 
is in chapter 4, 

This fblicity condition is one solution to the disjunction problem of inductive learning. 
Although one of the competing hypotheses to it is just as empirically adequate as it, one-disjunct* 
per^lesson has the added value that it explains why lessons are so often used and why they are 
helpful when they are used. The other approach would work equally well with a homogeneous 
sequence of examples rather than the partitioned sequence, defined by the lesson boundaries, that is 
actually used. Since the other approach ignores lessons, it canH explain why the lesson convention 
has been universally adopted as a helpftjl educational framework. Since the felicity condition does 
explain the ik^* of lessons, it has greater explanatory adequacy. 

In order to state the remaining hypotheses, three more undefined funcuons will be introduced. 
They will be used to define Cycl thereby allowing the architecture of the local problem solver to 
be cleanly expressed. 

( Interpret P S) represents one cycle of the normal interpretation (execution) of the procedure 
P. The second argument, S. is a runtime state. Interpret returns the next 
runtime state. Interpret is defined by the representation larguage used 
for procedures, 

is a predicate that is true when the runtime state S is an impasse. It is 
considered to be implemented by a set of impasse conditions, ff any impasse 
condition is true, then Impasse is true. Impasse represents the problem 
detection component of local problem solving. 

represents the other half of local problem solving, repair, ft is considered to 
be implemented by a set of repairs, such as Noop and Backup. Repair 
returns a set of runtime states. Each state results from the action of one of 
the repairs on the input state S. 



( Impasse S) 



(Repair S) 
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\V::h these functions, it is simple to state the main hypothesis governing local problem solving. 

Local problem solving 
Let 

(Cycle P S) s 

if-(Impasse S) then {(Interpret P S)} 

else {(Interpret P S') | S'€ (Repair S) and -( Impasse S')} 

The usual cycle ;& simply to execute the function Ini^erpret once. However, if Impasse is true, 
then dn execution of Repai r is inserted before the Itite rpret* Repair outputs a set of states. 
Some of these are filtered: if Impasse is true of a state, then that state is not passed to 
Interpret. Usually, several of Repair's output states are left after filtering. Hence, the 
execution c>c)e becomes non^deterministic at this point* Several ideas behind this h>pothcsis are so 
important that it is best to break them out separately, as ''corollaries'* of the main hypothesis, so 
that they ^^n be easily referenced later 

kepaiHmpasse independence 

Any repair can be applied to any impasse. 

Filter^ trigger symn\c(iy 

An impasse condition mggs'^rs local problem solving if and only if it also acts as a filter on 
repairs. 

These two corollaries emphasize that local problem solving really is psoblem solving, where 
the problem is being stuck. The problem is not solved until the procedure is unstuck (filtertrigger 
symmetry). Moreover, it doesn't matter how one gets unstuck as long as one succeeds (repair 
imjasse independence). The arguments for local problem solving are presented in chapter 6. ft is 
shown that the theor> could do without it and still generate some bugs, but in doing so it would 
lose much of its ability to explain those bugs. Essentially, it would have to build certain observed 
bugs explicitly into the set of primitives that are assumed to be present before learning begins* 
Thus, it offers no explanation for why these bugs occur and not others. Another "corollary" of the 
local problem solving hypothesis is 

Dynamic LPS 

The local problem solver reads and changes ue dynamic (runtime) state, but it does not change 
the procedure's structure. 

This says that impasses and repairs effect the runtime state rather than the procedure's 
structure. The kind of information that impasses and repairs need is available at runtime but not 
statically. ITie arguments for this hypothesis are in section 6A 

Ther: is another key feature of local problem solving that needs mentioning despite the fact 
that [ have tjJ defense to give for it \i is difficult to state formally, although the basic idea is clear. 
The repairs that have been discovered so far are extremely simple local changes to the interpreter's 
state. Also, impasse conditions are unsophisticited, local checks. The local problem solver does not 
seem to do an> large computations, nor does it look ahead to see the consequences of its actions or 
the interpreter's actions. The local problem solver doesn't really go looking for trouble, but when it 
encounters some it just barges through it expending as little work as possible. To summarize this 
general impression, it is convenient to name the hypothesis; 

Locality 

The repairs and impasse conditions are local. 
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This doesn't really constrain the model so much as express an orientation or direction in the 
ongoing endeavor of making local problem solving more precise. 

8.3 Commentary on the arguments and inherent problems 

The arguments presented in the preceding chapters have nothing to do with the way the 
pnnciplcs v^ere actually discovered, llie tales of the principles* discoveries deserve to be to!d late at 
night o\cT a couple of beers, if ai all The supporting arguments were constructed more recently. 
A main task of their construction was the discovery and unti.'^rstanding of the problems that the 
principles solve. This was sometimes a nontrivial task. For instance, it was plain to see what 
happened when the show-work hypothesis was turned ofTin Siena: the model overgeneraled wildly. 
Gut it ^as not clear ^^hether this wa^ a problem with the particular knowledge representation being 
ibcd or \\hether the explosion was due to a more general problem, k appears now to be a general 
problem, labelled the invisible objects problem. It seems to be a problem that afTects any inductive 
acci':;xit of learning, despite the fact that it has slipped by unnoticed in virtually all Al work on 
inductive learning. For lack of a better word, such general problems will be labelled inherent 
problems, because they seem inherent in any study of the domain. 

Iliree kinds of inherent problems occurred in tiie preceding chapters. One inherent problem 
wa^ figuring out how mutii of the classroom experience could be ignored. There isn't much to say 
about this problem. One takes a broad look al the phenomena and their context, makes a guess, 
and constrxicts a theory. In this case, it is fairly clear that induction is a reasonable guess. For skills 
other than mathematics, it may be much less clear which frameworks will yield successful theories. 

A second kind of inherent problem involves what one could loosely call laws of information. 
The problems seem to be inherent to any thinker, mechanical or human, that performs the given 
informjtion processing task. In \hii> case, two inherent problems with induction were encountered: 
the disjunction problem and the invisible objects problem. Such informational problems are 
extremely subtle. They are subtle in two ways. First, it is hard to discover that thv^ problems are 
there and what their exact nature is. For instance, the disjunction problem, which is well known to 
philosophers, has not been generally acknowledged by AI researchers until recently. Some linguists 
still tend to misunderstand it as a problem concerning the presence or absence of negative examples 
(see section 4.1). Information is apparently very slippery stuff. One can get buried in the 
formalisms used to express and manipulate it, so buried that a whole learning machine can be 
constructed without ever realizing that one has somehow solved an informational problem or even 
that the problem was there at aJl. 

The second subtlety with informational problems comes out clearly in the arguments of 
chapters 4 and 5. Certain solutions to the disjunction problem and the invisible objects problem are 
often extremely diRicult to differentiate on empirical grounds. For instance, it is difficult to 
differentiate the hypothesis that learners introduce the minimal number of disjuncts firotn the 
h>pothesis that they introduce at nost one disjunct per lesson. To split these hypotheses is not 
possible with the current lesson sc^^ences since the hypotheses make identical predictions using 
them. Empirical tests that could differentiate the hypotheses would require difficuh and morally 
Cjuestionable educational experiments. The subtlety of splitting hypotheses about how people solve 
informational problems makes sense in the context of the collective experience with computer 
programming. It is an axiom of programming that there are many ways to solve an infbrmation 
processing problem. Some may perform very similarly despite significant underl>ing differences. 
B> analogy, t' ere must be m<in> posi>iblc sijlutiuns to human information processing problems, ft 
will not always be simple to tell which one(s) people use. 
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The third kind of inherent problem is relatively straightforward. There are certain patterns in 
the data that stick ^^^t like sere thumbs. The problem is to account for them. Three major patterns 
were discussed in the preceding chapters: oveiTgencralization bugs (section 6.1)> Cartesian product 
bugs (section 6.3)» and deletion bugs (section 7.1). Each has an intuitively compelling explanation. 
In these cases, the explanations are based on overgeneralization> invention and forgetting, 
respectively. However, closer examination reveals that the phenomena can be accounted for b> 
other means. In fact, any one of the three mechanism — overgeneralization, invention^ and 
forgetting — can account for all three patterns. Howeven in doing so, they take on an 
unconstrained, stretched aspect. Stretching the mechanisms to cover phenomena for which they are 
ill-suited leads to a lack of explanatory adequacy. 

In short, solving even the simplest inherent problems in the theory requires appealing to 
explanatory adequacy. This was quite a surprise to me. Before the arguments were careftiUy 
worked out> 1 had expected empirical evidence to resolve most of the arguments. In fact, it does do 
most of the work, but it seems always to fall a little short of eliminating the last one or two 
competitors. 

8.4 Preview of Part 2 

Part 2, the rcprcsentaiion level, consists of chapters 9 to 16. It tackles the problems of 
knowledge representation, although details of the syntax of the knov^ledge representation are 
delayed until the next level. The representational level adjrcsses issues concerned with capabilities 
and expressive power. It addresses questions such as: shouid procedures be finite state automata, 
stack automata, or something more exotic? Should patterns have the full descriptive power of a 
first-order logic? How much flexibility should then; be in storing and restoring focus of attention? 
What about "short-term memory" for numbers? 

The or^ganization of the exposition divides representational issues along fairly traditional 
computer science lines. The first two chapters discuss control flow and data flow/. The terms are 
taken from Rich and Shrobe (1976)> who adopted tliese concepts in order to analyze programs from 
a language-independent position. (A more common use of the term "data flow,'* as in data flow 
computer languages (EJcnnis, 1974). has different connotations than the ones intended here.) The 
tack taken in these two chapters is to find out what constraints should be placed on the 
representation's ability to express control flow and data flcw> and indeed, whether they snould be 
separated at all. Chapter 13 concerns how procedures should interface with the external world. The 
external world of a computer program is usually an operating system, and the interface to it is 
notorious!) ad hoc. For the procedures of mathematical calculation* the external world is a writing 
surfecci such as a piece of paper. The interface is concerned with hov^ that resource is addressed, 
read and written. For instance, how is the paper searched to locate information fitting a partial 
specification? To answer this Question, the interface chapter describes the patterns that can be used 
for specifications and the kind of searches that can take place. 

The modularity hypothesis and argumentation 

Representational questions such as the ones above are extremely general. They can be 
construed to cut across many task domains (e.g., the issue of working memory). One way to ai^gue 
these issues is to refer to results from all over psychology. Thus, results from digit span 
experiments would be used to justify a particular choice of working memory, e.g.> a buffer with 
7±2 cells. This style of argjmentation was pursued by Newell and Simon in their work on human 
problem solving (1972). 1 doubt that 1 could improve on that magnificent accomplishment. 
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However, the underlying premise of that style of argumentacion is that it is valid to use results 
from all sorts of tasks to arigue for a particular information processing task, ^n particular^ the 
assumption is that the mind is like a general purpose computer, the mind uses the same architecture 
to perform a hupe variety of tasks by loading itself with different programs. This premise has 
recently come under heavy fire in cognitive science. 

Fodon Chomsky and others of the MIT school of cognitive science have argued tl^t it makes 
just as much sense to assume that the mind is modular (Fodor, 1983). llieir claim is that mental 
architectures are specialized for the processing that they do- By analogy with programs, the 
modulanty hypothesis is plausible. Computer science has found that there are some things that the 
vonNeuman architecture (the one used by most computers) is poor at, such certain kinds of 
pattern recognition. Yet other architectures have been devised that do such tasks rapidly with 
simple programs. If the modularity hypothesis is true, then the style of argumentation used by 
Ntwell and Simon is no longer valid* To be trustworthy, an argument can use data only from the 
task, at hand. This is exactly what the arguments of the next level do. 

Registers and stacks 

One of the earliest and most fundamental changes in computer programming languages was 
the move f^om register-oriented languages to stack ^oriented languages. In register-oriented 
languages, one represents programs as flow charts or their equivalent. The main structure for 
regulating control flow is the conditional branch. Data flow is implemented as changes to the 
contents of various registers. Stack-oriented languages added the idea of a subroutine* something 
tiiat could be called from several places and when it was finished, control would return to the caller 
of the subroutine. While the register-oriented languages need only a single pointer to keep track of 
the control sute of the program^ stack-oriented languages need a last in first-out stack so that the 
interpreter can tell not only where control is now (the top of the stack)* but where it is to return to 
when the current subroutine gets done (the next pointer on the stack)* and so on. The shifl of 
computer science to stack-orientation also augmented the representation of data flow. A new data 
flow facility was to place data on the stack, as temporary information associated with a particular 
invocauon of a subroutine. \t\ particular, subroutines could be called recursively ^A h parameters 
(arguments)* 

The fundamental distinction between register-orientation and stack*orientation ha$ lapsed into 
historical obscurity in computer science, but surprisingly* psychology seems to be somewhat slow in 
making the transition. When a psychologist represents a process, it is frequently a flow chart, a 
finite state machine or a Markov process that is employed. Even authors of production systems, 
who are often computer scientists as well as psychologists, sometimes give that knowledge 
representation a register orientation: working memory looks like a buffer, not a staek, and 
producuons are often not grouped into subroutines* For some reason, when psychologists think of 
temporar> memory, whether for control or data> they think of global resources, such a registers. 
Thjs tradition shows signs of changing. More recent production system architectures, such 
Anderson^s ACTF (1982), use goal stacks, subroutines, argument passing, etc* 

There are well known mathematical results concerning the relative power of finite state 
automata, register automata and push down automata. Some of these results have been applied to 
mental processes such 3S language comprehension (see Berwick (1982) for a review). However, I 
find myself rather unconvinced by such arguments* As Berwick and others have pointed out, the^ 
arguments must make many assumptions to get off the ground, and not all of them are explk^ly 
mentioned, much less defended. 
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The project of these chapters is to argue for a modem representation of core procedures based 
on a rich structure of motivated assertions: the hypotheses of the architecture level The discussions 
in the architecture level did not make strong assumptions about the representation language because 
they dealt with the facts at a medium-high level of detail The ai^guments in the representational 
level show what must be assumed of the representation language in order to push the structure of 
the architecture down to a low enough level that precise predictions can be made^ and made 
successfijHy. That is, they show what aspects of the knowledge representation are crucial to the 
theory* 
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I*hc i)bjcni\c of ihis chapicr is to show that the control structure is recursive. The argument 
suns with J mjinmum of assumptions about control Instead tlie h>pothcscs on loc^ll problem 
soKing and subprocedure acquisition from earlier ch^iptcrs will be used- However, it is necessary to 
speak m an informal wa> of glials and subgoals* with the intention that these be taken as refcrring 
to the procedural knowledge of subtraction itself, rather than expressions in some particular 
reprcsen cation (e.g., production systems* And Or graphs, etc.). In particular, it will be assumed that 
borrowing js a subgoal of the goat of processing a column, and that borrowing has two subgoals, 
namcj borrowing into and borrowing from. Borrowing-into is performed by simply aJding ten to a 
ccnain digit in the top row. while the borrowing-from subgoal is realized either by decrementing a 
ceri^im digit, or by invoking yet another subgoal borrowing- fVom- zero. These assumptions, or at 
least some assumptions* are necessary to begin the discussions. They are some of the mildest 
assumptions one can make and still have some ground to launch from. 

Three control regimes are considered in this chapter: 

L Fwtte state automata: The internal, execution slate for the core procedure is limited to a 
single *'you are here" pointer. U indicates which state (or goal, or rule, or other construct in 
the procedural representation) is currently executing. The proceduic may or may not be 
structured hierarchically. However, if it is, it may not have sclf-embedding subprocedures, i.e., 
subprocedures that call themselves recursively either directly or via other subprocedurcs. 

2. Push down automata: The internal, execution state for the procedure contains a last-in, first- 
out goal stack. The stack stores the currently executing goal's state by pushing, ft resets the 
control state to a saved goal by popping. The procedure's structure may have recursive 
Subprocedures. 

3- Coroutines: Coroutines are independent parallel processes, each roughly equivalent to a push 
down automaton. They are taken as a representative of the class of higher-order control 
structures. 

The third alternative isn*t considered as seriously as the others. The main competition is between 
finite state and push down automata 

Formal automata results 

'ihere are formal results concerning the expressive power of these control regimes. It can be 
proved that there are certain tasks that can be accomplished by procedures written for push down 
automata, and yet no procedure written for a finite state automata can perform the tasks in their 
full generality. These formal results are irrelevant here for several reasons; (1) The procedures for 
mathematical skills can be easily expressed for finite state automata. (2) The formal exprcssability 
ai^guments turn on the fact that a push down automaton's stack can be infinitely large. A push 
down automaton with a finite upper bound on its stxk length is equivalent in power to a finite 
Slate automata. An infinite stack is physically impossible to implement on material informarion 
procesM>rs. mcludmg brains and digital computers. Iliere are no true push down auto nata in the 
material world. 
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These trite observations sho^v the impotency of expressability argument*> Tor empirical theories 
(but not for mathematical ones), I^e p.sycho1ogica11y interesting issues eonccrn ' * closely the 
autom^ita s architectures approximate tlie structure of the mind*j infumriation. To put a differently, 
the question is which eontrol structure hest fits the observed proeedure. where *'fit** is evaluated by 
seeing whether the control structure enables the procedure's representation to be simple while 
capturing the empirical evidcnee, 

'lliis chapter offers two kinds of arg'jmcnts. One concerns local problem solving and the 
other concerns learning. Both arguments show that a stacK*bascd architecture simplifies their 
respective components, the locdl problem solver and the learner* while Cdpturing the empirical facts 
in a natural way* llic arguments concerning the Igtal problem solver are rather complex. Despite 
the faet that the> are some of the Sirongest and moit elegant arguments in the ^hole document, 
they are also the longest^ so the> ha\c been moved to an appendix (appendix 10, seetions 1 and 2). 
Only a synopsis will be presented here* 

9.1 Chronological Dcpcndcneydircetcd ^nA Hierarclifcal backup 

Control structure is not easily dcdueed by observing scquenees of writing actions* Too mi:ch 
mtemal computation can go on invisibly between obsencd aetions for one to draw strong inferences 
about control flow* What is needed is an event which can be asj>uined or proven to in some sense 
be the result of an elementary, indivisible control operation* The instances of this event in the data 
ftould shed light on the basic structures of control flow* Sueh a tool is found in a particular repair 
called the Backup repair. It bears this name since the intuition behind it is the same as the one 
behind a famous strategy in problem solving: backing up to the last point where a decision was 
made in order to try one of the other alternatives* This repair is crucial to the argunient, so it is 
wortli a moment to introduce it. 

Figure 9^1 is an idealized protocol, of a subject who has the bug SmallerFrom-Larger- 
Inst<.ad-of Borrow-From-Zero* The (idealized) subject does not know about borrowing from zero. 
When he tackles the problem 305*167, he begins by comparing the two digits in the units column. 
Since 5 is less than 7, he makes a decision to borrow (episode a in the figure)* a decision that he 
win later come back to* rie begins to tackle the first of borrowing's two subgoals* namely 
borrow ing-fVom (episode b)* At this point, he gets stuck since the digit to be borrowed from is a 
zero and he knows that it is impossible to subtract a one from a ;!ero* He's reached an impasse. 
The Backup repair gets past the decrement- zero impasbc by '*backing up" in the problem solving 
sense* to the last decision which has some alternatives open* The backing up occurs in episode c. 
where the subject says, "So 111 go back to doing the units column*'* He takes one of the open 
alternatives* namely to process the units column in a nomrial, non borrowing way. Doing so. he hits 
a second impasse, saying, **I still can*t take 7 from S,** which he repairs ('*so 111 take 5 from 7 
instead**). He finishes up the rest of the problem without difficulty* His behavior is that of 
Smaller- From- larger- 1 nstoad-of-Borrow- From -Zero* 
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a. 305 In the units column, I can't take 7 frum 5, so I'll 
- -1 67 have to borrow. 



b. 305 To borrow, I first have to decrement the next 
- 1 67 r-^^ jmn's top digit. But i can't take from 0! 



305 So I'll go back to doing the units column. I still can't 
- 1 67 take 7 from 5, so I'll take 5 from 7 instead. 



2 

Sf05 In the tens column, I can't take 6 from 0, so I'll have to borrow. 
- 1 67 I decrement 3 to 2 and add 10 to 0. That's no problem. 



2 

S?0 5 Six from 10 is 4. That finishes the tens. The hundreds is 
- 1 67 easy, there's no need to borrow, and 1 from 2 is 1. 

142 



Figure 9-1 

Pseudo-protocoi of a student performing the bug 
Smaller-From-I^rger-Instead-of-Borrow-From-Zcro. 
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From ihc pscudo-pruiDCol, it is clear ihat ihc Backup repair send&' control back to some 
preuuiis decision point so that a different alternative can be pursued 'ITic Critical question is, v^hat 
deLOrmines the decision point tiiat Backup will return to? ThoK are three well known backup 
regimes used in A1: / 

1. Chnmolvgtcal Backup: llie decision that is returned to is the^one made most recently, 
regardless of what part of the procedure made the decisiuiy 

2. Dcpetjdeticydtrected Backup: A special data structure is used to^record which actions depend 
on which other actions. When it is necessary to back up. the dependencies are traced to find 
an action that doesn't depend on any other action (an "assumption'' in the jargon of 
Dependency directed bxktracking). That decision is the *one returned to- 

3. Htcratxhual Backup, To support Hierarchical Backup, the procedure representation language 
must be hierarchical in that it supports the njtion of goals with subgoals, and the interpreter 
must emplo) a goal stack. In order to find a decision to return toi Backup searches the goal 
suck sLdrLing from the current goaU popping up from goat to supergoaL The first (lowest) 
goal that can "try a different method" is the one returned to. Such a goal must have subgoals 
that (unction as alternative ways of achieving the gf/al and moreover, some of these 
alicrnauve methods/subgoals must not have been tried by the current invocation of the goal. 
When Backup finds such a goal on the stack, it resets^ tfie interpreter's stack in such a way 
that when thft interpreter resumes, it will calt one of tlj^e goal's untried subgoals. (In AI> this 
is not usually thought of as a form of Backup. Ity is sometimes referred to by the Lisp 
primitives used to implement it^ e.g.. THROW in /Maclisp, and RETFROM in Interlisp.) 

llie key difference dmoi>g these backup regimes is, intuitively speaking, which decision points the 
interpreter "remembers." These establish which decision^points the Backup repair can return to. In 
Caroiiolugical and Dependency directed backtracking.^ the interpreter "remembers" all decision 
points. In Hierarchical backup, it forgets a decision^. point as soon as the corresponding goal is 
popped. The critical case to check is whether students ever back up to decision points whose 
corresponding goals would be popped if goal stack;^ were in use. If they don't return to such 
"pupped" decision points, then Hierarchical Backup' is the best model of their repair regime. On 
the other hand, if students do return to "popped'* decisions, then either Chronological or 
Depend cncy-dirccied Backup is the better model. Evidence is presented in appendix 10 that 
students never back up to popped decision points. By "never" I mean that returning to popped 
decision points generates sur bugs. This evidence vindicates Hierarchical backup, and shows that 
(1) procedures* static structure has a goal-sub^eoal hierarchy, and (2) a goal slack is used by the 
interpreter in executing the procedure. In sh^irt, push down automata are better models of control 
structure than finite state automata / 

/ 

9.2 One-disjunet-per-lesson entaik recursive control structure 

There is a second argument for ^ recursive control regime. It shows that recursive control 
structure is necessary for the one-disju^ct-per-fcsson hypothesis to be true. That is, if the language 
^n'l use recursion, then the one-disjynct per^lesson hypothesis cannot be imposed without causing 
the theory to lose empirical adequacy, llie argunient involves learning a certain way to borrow 
across /.ero, one that borrows centtir- recursively. In fact, it is the most widely taught BFZ {i.e„ 
borrow from zero) method. It has been used Uiroughout this document for examples. It is 
exemplified in the following problem state sequence: 
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3^0 5 
12 9 




1 9 

3^0 e 

12 9 



2 9 
3^0^6 
12 9 



1 9 
3^0^6 
12 9 



1 9 
3'0^6 
- 1 2 9 

6 17 6 

The zero has ten added to it, then the three is decremented, then the newly erca^ed ten is 
dceremonted, 

'ITie claim is that the only way to learn this way of borrowing in a non-recursive language 
violates the one-disjunct-pcr-iesson hypothesis. To m.iltc the argument concrete, a particular non- 
rccursi\e language, namely flow charts, is used. Figure 9-2a shows borrowing from a core 
procedure ^at only knows how to borrow from non-zero digits. Figure"9-2c shows borrowing after 
borrowmg across zero, in the fashion above, has been learned. Qearly, there are two branches to 
icarn. One moves control leftward across a row of zeros, and the other moves back across them 
until the column originatrng the borrow is found (i.e., the Home? predicate is true of the column 
IJ). There are many other ways that boirowing could be implemented, but if recursive control is 
not available, they would all have to ha\e two loops — one for traversing columns leftward, and 
one for traversing columns rightward. 



(AddTen B) 



B*-(NextLeft B) 



(Decrement B) 



B*-(NextRlqht B) 



n (AddTen B) 



B*-(NextLeft B) 



<^tZero?B) 



(AddTen B) 



D»(NextLeft B) 



(Decrement B) 



B*-(NextRi9ht B) 

T 



(Decrement B) ) ^ 



B*-(NextRightB) 



(AddTen B) 



B*-(NextLeft Bl 



< •(Zero?B) > ^ 



^ \L 

(Decrement B) k ;- 



X 



B*-(NextRight B) 



< (Home? B) 



b. 



c. 



Figure 9-2 

A non-rocursive representation of borrowing from zero. 
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Disjunction-ffcc learning sanctions tlic acquibiiion of at most one branch, and this must be 
one that adjoins >he new1> learned subprocedure to ilie older material A formal definition of 
branches and a4 unction depends on the syntax of the language, but tJie essence of it should be 
dear by examining the difference betwr^en figures 9-2a and 9"2b. Functionally, die difference is 
that the core procedure of figure 9-2b hiis learned how to borrow from one zero. Syntactically, 
there is one branch, and it is an adjoining branch because one arm of the branch skirts the new 
material, l^hc essence of adjunction is that one arm of the new condiliona) replicates the old 
procedures control pathway. 

'i should be clear that the transitjon from 9- 2a to 9-2c requires adding two disjunctions and 
thus violates the one- disjimct-per lesson hypothesis. It is less clear, but equall> true, that going 
from 9-2b to 9-2c requires adding two disjunctions (plus deleting some material as well). Two 
disjunctions must be mtroduced at exactly the lesson where the procedure goes from an ability to 
borrow across some finite number of zeros, to an abilit> to borrow across an arbitrary number of 
xeros. So, the finite state architecture forces the learner to violate the one di^unct-pcr-Iesson 
hypothesis. 

Yet, if the language allowed recursion, then the BFZ goal could be represented as in figure 
9^3b, with a recursive call to itself (the heavy box labelled "Borrow l^his representation allows 
the transition from nonvero borrowing (9-3a) to burrow ing-from-zcro (9-3b) to obey the one 
disjuncl-per- lesson hypothesis. In short, the language must have a recursive control structure so that 
a certain acquisitional transition obeys the one^disjunct-per-lcsson hypothesis. 



1 Borrow 
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B<-(NextRightB) 



a. 



Figure 9-3 

A recursive representation of borrow' g from zero. 
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9.3 More powerful control :»iriiciures 

There arc more powerful control rcginncs than stacK*bascd ones. However, they introduce 
extra flexibihly into the way control c^n be expressed. This extra flcxibihty is not only unnecessary, 
but It can cause the theory to make spurious, absurd predictions. As an example of the trouble 
moro powerftjl rontrol regimes cause, consider a simple control regime: Coroutines are a control 
structure that allows independent processes, each with their own stack, l^his control structure 
increases the expressueness of the langu^^ge, which allows acquisition to generate absurd core 
procedures. To demonstrate this point, consider the learning of simple borrowing. The 
instnictional sequence has a problenn state sequence such as: 



1 3 3 

4^5 4^6 4'5 4^6 

-29 -29 -29 -29 

6 1 6 



Given that coroutines are allowed, one way to constiue the first two actions, the new ones, is that 
they are a new coioutine. It happens that the example has this coroutine executing befbre die old 
one. but the learner need not take that as necessary. The ^ore procedure eould execute the 
coroutines interleaved, as in 



3 3 

4*5 4^ 4^5 4^6 

"29 -29 -29 -29 

6 6 16 



That is. the first action of the new coroutine occurs, then the first action of the old coroutine. Next, 
the new corouune resumes, and the borrowfrom action occurs. Lastly, the old coroutine ixjsumes, 
and answers the tens column. There arc other ways that the interleaving could happen. If the old 
coiouune finishes before the nev^ one, then the problem is answered incorrectly, because the 
bonowfrom happens after the answer is written down: 

'4^5 4*6 4*5 

-29 -29 -29 

6 2 6 

These are all ways of executing the same core procedure, one tliat was learned from an entirely 
correct sequence of aetions. T^iis core procedure seems an absurd prediction to make, a star bug. 
fn short, die use of a more powerftjl control regime allows acquisition to generate core procedures 
that It should not. ResUicting the control regime to be a stack improves the empirical adequacy of 
the theory by blocking the star bug, 

9.4 Summary and formalization 

Two arguments have been presented showing that a goal stack is necessarily a part of the 
execution slate and that procedures should employ u goal hierarchy, A third argument made the 
case that it was inadvisable to have a more flexiblt i^ontrol structure than a simple slack based one, 
l^he conclusion is that proc^ures should be represented with a recursive control structure and that 
the interpreter should use a goal stack. 
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ITicrc^reL many formal languages for representing procedures that have such a control 
structure. Lisp onc- Certain varieties of productions systems also use recursive control 
Unfortunate!), I kno\^ of no \\ay to fonnally specify a recursive control structure without also 
pro\iding a fairl> detailed specification of the language- For easy reference later, the basic idea will 
be recorded as an informal hypothesis, then a formal expression of it will be developed. 

Recursive control structure 

Procedures have ilie power of push down automata in that the representation of 
procedures permits goals to call themselves recursively, and the interpreter employs a 
goal stack- 
There arc several reasons for going beyond this infomfial expression and providing a formal 
description of the control structure. First, a formal description adds clarity not only to the 
description of the procedural representation but also to the other components of the model, such as 
repjirs and deietloni which manipulate procedures and execution states, A second reason for the 
extra wo'k of formalization is to bring up some tacit issues that are inherent in a recursive control 
regime. There arc two such tacit issues One concerns how the interpreter should choose which 
subgoal of a goal to run. The other concerns how the interpreter should decide when a goal is 
satisfied and may thus be popped from the goal stack. 

To begin, a vocabulary is needed to speak of the static structure of procedures. For the 
purpose of control structure, only a few terms are needed: 

goal: A procedure has tokens (names) called goals. 

rule: The subgoals of a £oal are represented as a set of rules 

"under'* that goal, 

applicability conditions: The condition under which a subgoal may be executed are 

^parated from it and used as the applicability condition of 
the corresponding rule. 

action; The other half of a rule is the name of the subgoal. H; 

keeping with production system terminology, this is called the 
rule's action (er sometimes its subgoal). 

When necessary for illustrations, the following syntax will be used: 

Goal: Borrott-fVom 

1. T=:OinUi 'turrentcolumn BFZ 
* 2. T^^Oin ^hecurrentcolumn ^ Docremer.' 

The goal is named Borrow-from. It has two rules. The rule numbers are used only as labels. Each 
rule h«s an applicability condition to the left of the arrow and an action {goal ndme in this case) to 
the right. Nc ' ing important depends on this syntax. 

A goal with no rules under it is called a primitive goal. Primitive goals are at the grain-size 
boundary of the procedural representation. They represent acuonSi such as moving the hand to 
write a digit, that are beneath the grain size of the model. In order to deal with them, the 
interpreter is equipped with a special operation, Eval, that may be applied to a primitive goal and 
the current runtime state. Such a call will change the external (problem) state, but it will not 
change the internal (execution) state. Managing the execution state is the interpreter's job, with 
'^orrje help from the local problem solver, of course. 
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Given the notion of go^ils* the inlcm*it (execution) slate of t])e interpreter can be defined* It 
connut binip1> be a stjck uf goals. It must have d little mure information. 'ITie extra information is 
needed to execute conjtmctive goalb* such *is Iiorro\\(* which perform several rules before popping^ 
1 he CAtrj mfonnation indicates which uf the goal's rules have alread) been executed* ITiis prevents 
the conjunctive goal from deciding to run the same rule over and over again* ITie easiest way to 
add this extra mformation to the execution state is iu stipulate that the stack holds pairs consisting 
uf (1) a goal, and (2) the subset of the goal's rules that have already been nin. When a goal is first 
pu^^hed onto the btack* the set of executed rules will, of course* be empty, "ITiere ia nothing special 
gomg on here. Any recursive language's interpreter would have to have some such information 
(c.f>, the refracturmess principle for conflict resolution in production systems* McI>crmott and 
Forgy. 1978). 

There is one other kind of extra information that must be a part of the execution state. It is 
not ^ P^rt of the stack. At minimum* onlj a single bit is needed. Basically, it remembers whether 
the interpreter's last action was a push or a pop. The reason this extra bit of control state is needed 
IS that the Backup repair is impossible to implement without it Backup pops the stack back to an 
OR goal (OR goals will be formally introduced in the next chapter). Normally, when control pops 
back to an OR, that goal immediately pops. Since OR goals only execute one rule, and a rule was 
just executed, the OR goal is done and should be popped. However* Backup needs to reset the 
execution state so that the OR will be resumed. The idea behind Backup is to take some alternate 
rule to the one that led to the impasse that it just repaired. The problem is how to tell the 
interpreter not to pop the OR bul instead to pick another rule and execute it Backup cannot 
change the set of tried rules associated with the goal. "ITiese must be left alone so that Backup will 
not cause the interpreter to take the impasse-causing rule over again. Since the stack will not 
suffice for Backup to communicate with the interpreter* a new bit of execution state is needed. This 
bit will be called microstate. 

Formally, defining the interpreter means defining the function Interpret. In section 6.5, 
Interpret was introduced as an undefined function whose input is a runtime state and the core 
procedure. Its output is the next runtime state. Given the concepts introduced so far, it is easy to 
define Interpret. Some nomenclature is needed: 

(Goal-TOS S) The goal on the top ofthesiaek in the runtime states, 

(ExecutedRules-TOS S) The set of executed rules on the top ofthe stack of the runtime state S. 
(Ml c rostate S) The microsiate ofthe runtime state S. 

(Push S G RS) Changes the runtime state S by pufihing the goal G and the rule set RS 

onto the stack. Also* it changes the microsiate of S to Push, 

(Pop S) Pop*s the runtime staters stack and sets its microsiate to Pop. 

(Eval G S) Evaluates the goal G> which must he primitive. This changes only the 

external component (i,e.* problem state) of runtime state S, 

These are defined functions that access and change the runtime state in sjir^Ie ways. However, two 
new funcuons are needed which will remain undefined until the next chapter; 

(PickRule S) Given the current runtime state S, Pick Rule chooses one of the top 

goaKs rules as the next rule to execute and returns it. Typically^ 
PickRule tests the applicability conditions of the rules of G that have 
not yet been executed. It finds the rules whose conditions are true in 
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ihc current state, 'llicsc arc the applicable rules for the current 
instantiation of the goal. If several rules are applicable, PicfcRule 
applies criteria, called conflta rewluiion stratesi^s, to chose which of 
the applicable rules to return. 1 i^e next chapter spociHcs the conflict 
resolution strategies^ thus defining the function. 

(ExItGoal ? S) lliis predicate is true when the runtime state S is such thu the current 

goal should be popped 

The definition of Interpret is shown in figure 9-4. There are two basic cases: (1) If mjcn::'ate 
is Push* then the current goal has just been started up. If it is a primitive goal, then the interpreter 
just executes it: otherwise, a rule is chosen and executed (2) If microstate is Pop, then the current 
goal has just had one of its rules executed. The choice is between resuming it, by choosing a rule 
and executing it, or exiting the goal by popping the stack. 

Local problem solving and the execution state 

As discussed tn section 6*5, the local problem solver and the interpreter take turns examining 
the runtime state and modif>ing it. Although it is premature to ^enturc a complete definition of 
how repairs and impasse conditions are implemented, it is interesting to sketch a few of them. 

There is an impasse condition that checks the preconditions of goals. Preconditions need to 
be checked just before a goal is executed. This is easily done using microstate. Expressed 
informally, the impasse condition is a three-part conjunction: 

1. (Microstate S)=Push, and 

2. there exists a precondition C for (Goal -TOS S), and 

3. C is false in S. 

Another impasse condition detects when the interpreter would hall because no rules apply. It's 
expression is also a conjunction with three a)njunc[s: 



If (Microstate S)=Push then 

If (Goal-TOS S) is primitive then 

1. (Eval (Goal-TOS S) S) 

2. (Pop S) 

else (ExdcuteRule $ (PickRule S)) 
el so 

If (ExitGoal? S) then (Pop S) 
else (ExecuteRule S (PIckRule S}). 

where 

(ExecuteRule S R) = 

1, Add R to (Execut0dRules-TOS S) 

2. (Push S (ActionOf R) {}) 

Figure 9-4 
Definition of Interpret. 
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L (Goal-TOS S ) is not a primitive goal, and 

Z citiicr (Microstate S) = Push or { Exi tGoal ? S) = f alse^and 

3, (PickRule S) returns no rule. 



The idea is ihai the halt impasse eundition first checks to see if PiekRule will be called by 
the interpreter If so. it calls PickRule itself to see if it is able to find a rule that is applicable. If 
nut. then the procedure is stuck, the impasse condition is true, and a halt impasse occurs. 

Repairs are equally simple to express. The Noop repair is virtuaily trivial It simply calls 
(Pop S). This simulates a return from the current goal. ITie Backup repair is only a little more 
complex. It calls (Pop S) until (6oa1-T0S S) is a disjunctive goal that has some unexecuted 
rules. Then it sets microstate to Push, Since microstate is Push, the interpreter will wind up calling 
P icfcRule and executing another of the ?oars rules. If Backup left microstate at Pop. then the 
interpreter would call ExitGoal? and probably v^ind up popping the stack that Backup so 
carefull> adjusted, Microstate, or something like iu is crucial to expressing ilie Backup repair. 
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"Ihc previous chapter dealt with control flow from a broad perspective. It contrasted three 
cuntful regimes ^ finite state, recursive, and coroutine — that permeate the whole of the procedure 
representation language. "ITiere is another perspective on control flow, a more narrowI> focused 
unc, but one that is equal1> familiar to people designing or learning computer programming 
Ijngudgcs, ^^hat control cunstructions docs the language allow? For instance, does the language 
Support an> of these: 

if-then-else 
Case or SelectQ 
COND 

AND 
OR 

While or Until loops 

PROG or BECIN .,END Blocks 

"For I from 1 toN do...." 
"Tor each X in the sets do...." , 
"Find X in the set S such that 

The equivalent of such questions can be meaningfl:!1> asked of the language used to represent 
human procedures. Given that "goal" is the term used for a group of related subgoals, the question 
asks what goal types or goal sckemata exist. For a procedure learning theor>, postulating a goal 
t>pe of the representation language means that the students have a prior expectation that a certain 
pattern of control will be common. If the learning theory postulates a bias (simplicity metric) of a 
procedure inducer, then hypothesizing a goal type amounts to parameterizing the bias so that the 
inducer tends to uew examples as having a certain pattern of contioK the one expressed by the goal 
t>pe, rather than construing the examples as exhibiting other flows of control Later, in the Bias 
level of this document, several inductive biases will be postulated. So the issue of goal types is an 
important one for step theory. 

The same comments apply to repair theory. Both local problem solving and deletion are 
aflccted b> the goal-subgoal hierarchies of the core procedures that they operate on. Since goal 
t>pcs can afff ct those structures, the existence and identity of goal types can impact repair theory. 

There is. however, an inherent methodological difficulty in determining which goal types exist 
Mast goal t>pes are redund^tnt in that their pattern of control can be expressed without them, albeit 
less concisel>. In fact, a single goal type suffices to express all the others. As proof, one can offer 
pruduction s>stems. A typical production system uses just one goal type. For instance, a goal like 
the following one aels like a PROG (in Lisp) or a becin*end block (in Algol): 

Goal: Regroup CO 

1. {}=* (Borrow-intoT) 

2. {} (Borrow-frotn (Top-of-next-colutnn T)) 

(In this example and the ones following it, a Tew inconsequential conventions have been adopted. 
Each goal has its Subgoals listed as production rules beneath it, numbered for convenience only. 
The conditions governing when a rule can be executed are in braces on the left of the arrow. The 
rule's action is on the left. Arguments, such as T above, are treated as in Lisp or Algol. Rules are 
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icsicd in order; ihc first one whose conditions are true is run, except rules that have been run 
alread> under the current invt}cation of the goat may not be run again. This convention 
corrcspunds to two commou conflict resolution strategies for production systems, recency and 
refractoriness (McDennott & Forgy, 1978)- Nothing in :he following alignment depends on the 
adoption of these conventions-) l^he goal above acts like a pROG because it executes both subgoals. 
florruw into and IJurrow from, and it docs so in a fixed sequence. Other well*kiiowii goal types can 
also be emulated by production systems, A goal like: 

Goal: BorrowFrom (T) 

L {(Zero? !)} => (BorrowFromZero T) 

2. {(Not (Zero? T)) (Not (CrossedOut? T))} =^ (Decrement T) 

acts like an if then-else in that it only executes one of its subgoals depending on whether T is zero 
or not. The following goal acts like a loop: 

Goal: Multi (C) 

L =>(SublColC) 

2. {(Exists? (NextColumn C))} => (Multi (NextColumn C)) 

ft does the "loop body," the subgoal SublCol then tests for temriination. It calls itself tail- 
recursively if it is not done yet. In short, a single goal type suffices to express many kinds of 
control 

Ml production systems that I know of use just one goal type. Exactly whieh goal type is used 
is different in different production systems since in production systems, tht; goal tape's bdiavior 
depends on the production system's conflict resolution strategics (McDcrmott & Forgy, 1978> review 
a variet> of conflict resolution strategics). Nonetheless* the principle of homogeneous goal types is 
so widel> adhered to that it could even be taken as the defining characteristic of production systems. 

Over the years, many procedures have been expressed in production systems. In some sense, 
this constitutes proof that the homogeneous goal t>pe principle cannot be refutec' on grounds of 
inexpressiveness. It docs not limit the language so much that it becomes impossible to express some 
procedures. Consequently, any challenge to the homogeneous goal type principle will have to be 
made on more subtle grounds. As it turns out, the distinctions that will be used in this chapter are 
vanishingly subtle. For instance, it was mentioned that goal types affect inductive biases. Actually, 
they onl> affect the elegance of such biases. If the interpreter can distinguish different control 
flows, then so can the biases, although they might have to simulate execution of the procedure in 
order to do so. If students are biased towards forming loops, say* but the language doesn't have a 
loop goal type, then the theorist can write a bias that captures the students' predilections by, e.$., 
having the bias chc^k for tail -recursive calling paths in the goaI*subgoal hierarchy. This would 
make it an ugly, complicated bias, but it would capture the students' cognition. In short, the only 
way the existence of goal types will show up is in the parsimony of the theory. This is the 
methodological problem mentioned earlier. The goal type issue will be settled only by weak 
pariimony-based arguments. 

This chapter considers three hypotheses about goal types. The first one is the homogeneous 
goai type principle. The second is the goal type principle that is used in the current version of the 
theory, l^he third is the goal type principle that was used in an early version of repair theory 
(Brown & VanUhn, 1980). 



15 V 



154 



Goal Tm-s 



1. And. A goal is pupped when 'applicable subgoak have been tried. A subgoal is applicable 
if. e.g., the conditions on the left side of its rule arc tnie. 

2. And-Or: Goals hiive a binary type. If a goal's type is and, applicable subgoals are 
exiXutcd before the goal is popped. If it is OR, the goal pops as soon as one subgoal is 
executed. 

3. Scitih faction conditions; Goals have a condition v^hich i^ tested after each subgoal is executed, 
if the condition is true, the goal is popped. Metaphorically speaking the goal keeps trying 
different subgoals until it is satisftecL^ \ 

As It turns out, support for a fourth hypothesis has been recently discovered. "ITie hypothesis 
extends the And Or hypothesis by adding a third goal type, a loop across elements in a set, e.g., 
'*poreach x in the set S do F.vidence for ii was discovered when Sierra traversed the Heath 
lesson sequence. This traversal, and the resulting core pnxedure tree* were di?icusscd in section 2.8* 
Two branches of the core procedure tree (with suffixes PlOO and Blk in figure 2-14), led to star 
bugs. The root of these mispredictions is the following; It appears that students can solve multi- 
column problems after seeing examples with at most two columns. But Sierra docsn^t fonn the 
mulu-colunm loop factually* a tail recursion) until it is guen three-column examples. For Sierra to 
form the loop on two-column examples would require biasing the learner in fa\or of loops. ITiis is 
most easily done by giving the knowledge representation language a loop goal type. Although I 
won't recount more details here, it appears that the PlOO star bugs would be avoided if (1) the 
procedure language had a Foreach goal type, and (2) Sierra's learner was biased towards it 
Unfortunately, it requires much vvork to install this A nd Or- Foreach hypothesis in Sierra, and f 
have not done so yet. Until that is done, there is no way to know whether the new hypothesis has 
some unfortunate side-effect that would cancel or outweigh its apparent benefits. Consequently, 
only the three hypotheses listed above will be given active consideration in this chapter. 

There are two parts to the arguments of this chapter. The first part shows that the And Or 
hypothesis is better than the And hypothesis. There are three separate arguments for this point, 
none of which is particularly convincing on its own. However, the fact that all of them point to the 
same conclusion provides support for the adoption of the And-Or hypothesis over its homogeneous 
cousin. The second part of the chapter is a competition between And Or goal types and satisfaction 
conditions. It will be shown that satisfaction conditions provide better empirical coverage with 
respect to deletion* but they require unmotivated assumptions about learning* This argument is 
quite complex, so most of it has been moved to an appendix, Only a synopsis is presented in this 
chapter. 

lO.I AssimllaUon is Ineompiitlbl^ with the And h^pothecis 

One problem with the And hypothesis is due to the cumbersome way that disjunctive goals 
must be expressed. To express the fact that two subgoals are mutually exclusive, one must put 
mutually exclusive applicability conditions on them. For example, to express the fact that there are 
two mutually exclusive ways to process a column, depending on whether its a two digit column or a 
One-digit column, one would write; 

Goal; process*column (C)! 

1. (blank? (bottom C)) » (bring-down-top C) 

2. (not (blank? (bottom O) » (take-difference Q 

Both rules must have applicability conditions in order that they be mutually exclusive. On the 
problem 
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the first rule must be prevented from applying to the units column^ so its (ipplicability condition is 
nccess*ir>. ITie second rule's .ipphcability condition is necessary to present it from applying to the 
ten's Lolumn. Recause the And hypothesis tries to execute all subgoals, one can only get mutual 
exclusion by usirg mutually exclusive applicability conditions. 

ITus implies that assimilating a new altemative method of accomplishing a goal may involve 
rewriting the applitabilit) conditions of the existing subgoals. If the applicability conditions arc not 
changed, then the new subgodl will not tum out to be mutually exclusive of the old subgoal. For 
example^ to assimilate a new method of processing columns, say on? that handlcb columns whose 
ttip and bottum digits are equal {i.e.. the rule N-N-0). one would ha\e to modify the above goal 
to become: 

Goal: process-column (C): 

1. (top CKboEtom O) =^ (writevero^in-answer Q 

2. (blank? (bottom C)) =^ (bring-down- top C) 

3. (and {not (blank? (bottom C)) 

(not (= <top C)(bottom O) =^ (take-difference C) 

Adding the nev^ suhgodi forced the applicability conditions of one of the existing subgoals to be 
changed (the underlined material was added). The essential point here is that the And hypotliesis 
forces mutually exclusive alternatives to be highly interdependent. ITiis lack of modularity forces 
leammg to modify existing material e\en though that material's liinction has not changed. There 
are some potential drawbacks to this. 

Mathematical skill acquisition is clearly incremental New components of a skill are slowly 
added. Knowledge accrues, rather than springing full grown from some catalytic experience. Such 
plodding, slow'growing learning is often called assimilation to differentiate it from learning that 
takes the form of a radical restructuring of the student^s knowledge. Howeven the hypotheses that 
have been accepted so far do not rule out such radical restructurings. In particular, the one- 
disjunct perlcsson hypothesis rules out adding extra disjuncts but it say^ nothing about augmenting 
existing disjuncts, e,g.. by adding conjuncts to applicability conditions, as in the example above. To 
capture the apparent quality of mathematical skill acquisition, the following hypothesis was used in 
an eariier version of the theory (VanLehn, 1983jr 

Assimilation 

Disjoin only adds a new disjunct(subprocedure). ft does not modify the existing 
knowledge structure in any other way- 

This hypothesis says that subprocedure acquisition is an additive action. Learning doesn't 
change the old structure, it only adds a new chunk (disjunct, subprocedure). This hypothesis is yet 
another felicity condition on learning, ft amounts to a guarantee to the students that what they 
leamed earlier will remain valid and usefiil. To put it differently^ the curriculum is arranged to be 
efTicient It doesn't teach a concept or a skill unless it will remain useful, possibly as the basis for 
fiirther development of concepts or skills. As with the other felicity conditions, evidence in its favor 
IS thdt current lesson sequences arc constructed so that the correct procedure can be leamed without 
violating the felicity condition. fliaL. plus its inherent plausibility^ are the the only known support 
This IS not sufficient, to my mind so the assimilation tiypothesis is not included in the current 
theory. Yet, if it were in the theory, as intuition dictates it should be. it would have entailments for 
the goal type controversy. The assimilation conjecture is incompatible with the And hypothesis. As 
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v/Q just saw, ihc And hypothesis fortes loo much of an existing core procedure to be changed in 
order lo assimilate a new subproccdure. In short, ig the extent that one accepts the assimilation 
conjecture, one must also reject the And h>p(jihesis, 

10,2 The And'Or hypothesis simplifies the Backup repair 

ITiere is a minor advantage to the And-Or hypothesis. Using both and goals and OR goals 
simplifies the Backup repair. The ftjnctipn of the repair is lo pop the goal stack back to the first 
goal that has some alternatives left lo ir>. When goals arc typed* it is trivial to tell whether a goal 
has any alternatives left, ff it is an an'D goat* by definition it does not. If it is an OR goal* then 
only one of its allcmatives has been tried (because it normally pops after trying one subgoal), so all 
the rest must be open. Backup's search becomes trivial: pop the stack back to the first OR goal. 
That this simplicity falls out of the And-Or exit convention is weak evidence that the binary 
distinction is somehow a natural one for the prbcedurat representation language :o make. 

103 Rule deletion needs the And^or distinctions 

It was shown in chapter 7 that a certain group of bugs* labelled the deletion bugs, seem to 
require some kind of deletion operator in order for the ihcor> to generate them. To generate them 
with incomplete learning and/or local problem solving would necessitate expanding the power of 
those mechanisms to an unacceptable degree. 

This chapter picks up the deletion story and considers how deletion can be formalized. The 
first step is to show that simply deleting rules will suffice to generate the deletion bug^. The next 
step will be to show that not all rules should be subject to deletion. If certain rules are deleted 
then star bugs are generated. To avoid such absurd predictions, some constraints must be placed on 
the rule deletion qjerator. This is where the goal type controversy comes in, ft can be shown that 
the rules that should not be deleted are exactly the ones under disjunctive goals, Onoj again, the 
distinction between aNd goals and OR goals arises naturally, ff the representation uses a binary 
type to distinguish the two goal types, then the deletion operator becomes trivially simple. Without 
it, the operator must examine the applicability conditions of rules ta infer vi^hich goals are 
disjunctive ones. The fact that a simple statement of the deletion operator falls out under the And* 
Or h>pothcsis is more criiveiging evidence that it is the right one for the knowledge representation 
to employ. 

In order to discuss how ig generate the deletion bugs* it helps to have a concrete expression of a 
subtraction procedure so that deletion may act upon it. Figure 10-1 shows a particular subtraction 
procedure, and gives a brief explanation of it, ITie procedure is expressed in a rule-oriented syntax. 
Since the issue of goat t>pes is still under discussion, and the applicability conditions depend on the 
exit conveniJDn in use. the expression of figure 10-1 uses an informal representation for 
applicability conditions in order to stay neutral. 

Given this particular procedure for subtraction* some general points about deletion can be 
made. For illustration, suppose tliat deletion is formalized as deleting a rule, any rule. Since there 
are 18 rules in the procedure, there are 18 possible deletions, ( ITiis is not quite a straw man, by the 
way, since it can be taken as one formulation of Young and O'Shea's approach (1981) to generating 
bugs,) *rhe first point to make is that rule deletio'i doer i.^.decd generate the set of deletion bugs. 
Deleting rule 11 generates the following deletioii bug: 
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Goal: Sub (P) 

K P is a multi-column problem => (Multi (Rightmost-column P)) 

2. P is a single-column problem =^ (Diff (Rightmost-column P)) 

Goal: Multi (C) 

3. tnje=> (SublColC) 

4. true (Sub^rest (Next-column C)) 

Goal: Sub-r(st(C) 

5. Cis not ilic leftmost column =^ (Multi C) 

6. C is the leftmost column, and itsbottum is blank => (Show C) 

7. C is the leftmost column, and itsbottum is not blank => (Diff C) 

Goal:SublCol(C) 

8. r<B in C=> (Borrow C) 

9. the bottom of C is blank (Show C) 

10. C is normal: T>B and its bottom is not blank =^ (DifTC) 

Goal: Borrow (C) 

IK lrue=^^ (Borrow'from {Next-column C)) 

12. tnje=>(AddlOC) 

13. tme=^(DiffC) 

Goal: Borrow'from(C) 

14. . T=OinC=^(BFZC) 

15. T;*Oin C => (Decrement-top C) 

<3oal: BFZ(C) 

16. taie =^ (Borrow-fixjm (Next*column C)) 

17. tnie=^^(AddlOC) 

18. taie => (Decrement-top C) 

llie Sub goal simply chooses between trivial one-column processing and the usual multiple-column 
procedure. The two goals, Multi and Sub-rest, express a loop ^ross columns as a tail recursion. 
Next-column is a function that takes a column and returns the next column to the leP if the 
column C that is given to Sub-rest is the leftmost column, Sub*rest answers it with either rule 5 or 
rule 6, thus terminating the recursion. Show and Diff are primitives. Show writes the top digit of a 
column as its answer. Diff takes the column difference. Columns other than the leftmost column 
are answered by SublCol. SublCol is the main cojumn processiog goal If the top digit of the 
column is less than the bottom digit (i.e., T<B in the shorthand used throughout this document), 
then SublCul calls borrowini (rule 8). If the colu*nn has a blank instead of a bottom digit, then it 
simply wntes the top d^git as, the answer (nile 9). Otherwise, it does the usual take-difference 
operation. The Borrow goal first borrows from the next column to the left, then adds ten to the 
column onginatjng the borrow. AddlO is a primitive that adds ten ro the top digit of the column its 
given. Borrow winds up b> taking the difference in the original column. There are two ways to 
achieve the Borrow from goal. If the column's top digit is non-zero (nile 15), then the procedure 
simply decrements it by one (Decrement-top is a primitive). If the column's top digit is zero, then 
Borrow'from calls BFZ. *rhe goal BF2 first borrows from the next digit to the left (i,e., if the 
borrow onginated in the units, this would be a borrow from the hundreds column), then adds ten to 
the current column. Since the top digit was zero, this means the top digit will become a ten. The 
next nile, 18, decrements this ten to a nine. This finishes the BFZ, 

Figure lO-l 

A subtraction procedure, presented in an informal role-oriented syntax, with explanation. 
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Borrow-NVDccrcmcnt: 3 4 8 3 4*6 3*0^ 

" 1 0 2 - 1 2 9 - 6 9 

2437 226X 348X 

This bug docs only the borrov^^-into half of borrowing- It omiis the borrow-from half, ft is a 
deletion bug because its familiarity with borrowing n:<tkes it likely that the students with this bug 
ha\e been taught simple borrowing. However, part of bornmjng apparently did not sink in or it 
was forgotten. Another deletion bug results from deleting rule 16. 

3 9 

Borrow From-Zcfo: 3 4 6 3 4*6 3*0*7 

-10 2 '12 9 '16 9 

2437 2167 238X 

This bug only does part of borrowing across zero, ft changes the zero to ten then to nine, but docs 
not continue borrowing to the left. Because it does do part of borrowing across zero, it is likely that 
subjects with this bug have been taught borrowing across zero, ft is also clear that they did not 
acquire all of the subprocedure. or else they forgot part of it. If the subtraction curriculum was 
such that teachers first taught one half of borrowing across zero, and some weeks later taught the 
other half, then one would be tempted to account for this bug with incomplete learning. But 
borrowing from zero is, in fact, always taught as a whole* So some other formal technique, 
deletion, is implicated in this core procedpVs generation. 

A third deletion bug is generated by deleting rule 18, It soems to forget to do the second 
decrement in the borrow across zero: 

3 2 2 

Cton't'Decrcmcm-Zei,-.: 3 4 6 3 4*6 3*0^7 3*0*7 

- 1 0 2 - 1 2 9 - 1 6 9 2 i 

2437 2167 178X 210 8X 

This shows up rather dearly in the last problem. The zero has had a ten added to it, but it has not 
been decremented as it should be to complete the borrow from 7.ero. Consequently, the answer in 
the tens column is a two digit number. (This sometimes triggers a second impasse.) A more 
detailed description of Don*t-Dccrement-Zero appears in section 7.1. 

There are other deletion bugs, but these three suffice to show that rule deletion is a 
productive addition to the theory. V ese three bugs will be used as a yardstick to measure the 
empirical adequacy of the various kinds of deletion that will be discussed. 



Unconstrained rule deletion o\srgeneraies 

The problem with unconstrained rule deletion is that it overgenerates. About half the rule 
deletions generate star bups. Such star bugs must be blocked if the theory is to have any empirical 
worth at ail. The issue is how to constrain rule deletion. A birds-eye view of the issue is provided 
by figure 10-2, which lists each rule along wtth what its deletion results in (roughly). Some 
deletions cause star bugs, some cause observed bugs, and some cause bugs that have not >,i been 
observed but are plausible predictions for (uture observations. 
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Goal: Sub 

1. predicted bug: only docs single column problems. 

2. j/jrbug; can't do single column problems but does others perfectly. 

Goal: Muki 

3. predicted bug: only does the Icrtmost column. 

, 4. jf^rbug when Iil-"Z exists: dtjes units column only, but perfectly, even if IJF/. is required. 

predicted bug if Borrow not yet learned: docs units column onl>, taking absolute difference. 

Goal; Sub-rest 

5. predicted bug: only does leftmost column. 

6. predicted bug: forgets leftmost column when it has a blank bottom. 

7. jtorbug; leavcsleftmost column unanswered unless it has a blank bottom. 

GoaI:SublCol 

8. various observed bugs: e.g,, Smaller- From- fjiiigen Zero- instead- of- Borrow. 
9* various observed bugs; e.g., Quit-When- Bottom- Blank, Stutter-Subtract. 
10* Jtorbug; docs borrow columns but leaves ordinary columns unanswered. 

Goal: Borrow 

IL observed bug; Borrow-No- Decrement. 

12. aj/jrbug. Blnnk-With-Borrow-From,and various observed bugs> 
e.g.. Smaller- From- Larger- With-Borrow. 

13. starhug: docs scratch marks for borrowing, but leaves the column unanswered* 

Goal: Borrow-from 

14. various observed bugs; e.g.. Stops- Borrow-At-Zero. 

15. starhug: neverdoes the leftmostdecrement of borrow* including BFZs, 

Goal: BFZ 

16. observed bug: Borrow-From-Zero. 

11 a starbug. Blank-With- Borrow -Across-Zero, and various observed bugs 

ag., Borrow- Across-Zero. 
18. observed bug; Don*t-Dccrement-Zero* 

Figure l(h2 

Results ftom each rule deletion of the suDtraction procedure, indicating which generate star bugs. 



Inspection of figure 10-2 reveals ihat the deletions that generate star bugs fall into two basic 
groups. Half of the star deletions are rules beneath disjunctive goals (deletions of rules 2. 7, 10 and 
15), or rather goals thai ftjnction as OR goals even though the> would not be marked as such under 
the And exit convention. The remaining star bugs (rules 4. 12, 13 and have the characteristic 
that they know how to borrow from zero, indicating some sophistication in subtraction, but they 
nonetheless leave certain columns unanswered. "ITie juxtaposition of such sophistication in 
borrowing with missing kno^^ledge about answeririg columns makes these bugs hjghb uuljkely. In 
the next section, these star bugs will be discussed. In diis section, attention will be focused on 
blocking the star bugs of the first group 

Deleting any of rules 2, 7, 10 or 15 generates a star but?. These rules are all fceneath 
disjunctive goals. Deletion of the Sisters of these rules (i.e„ rules 1^ 5, 6, % or 14) generate bugs, 
Sonne of which are observed, but they are all bugs that would be generated by incomplete traversal 
of the lesson sequence. Deletion under a disjunctive goal hurts the theory's empirical adequacy if it 
affects it at all. To drive this point home, consider the Borrow-fi'om goal. It has two rules. 
Deleting the second one, rule 15, hurts the theory. Rule 15 docs borrowing from non-zero digits. 
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ft decrements the digit by one. If aile 15 is deleted, a star bug occurs Some of the star bug's 
work is illustrated below: 

9 

•Onli-JJorrowTrom-Zcro: 3 4 6 3 2 0^7 

-10 2 - 1 9 "16 9 

143V 226X 138X 

This star bug misses alt problems requiring borrowing beeausc it never performs a decrement, 
despite the f^ct^that it shov s some sophistication in borrowing aeross ^ero in that it ehangcs zeros to 
nine. The juxtaposition of this eompctcney in borrowing aeross zero witli missing knowledge about 
the simple ease makes the bug highly unlikely. 

The other ailc of the goal Borrow-from, aile 14, docs borrowing for zeros, ft simply ealls the 
BFZ goal. If ailc 14 is deleted the procedure acts just like BFZ had never been taught. In 
particular, the deletion would generate the bug Stops-Borrow-At-Zero, which has been used as a 
prime example of the combination of incomplete learning and local problem solving (see section 
2.9). Blocking the deletion of aile 15 is good because it prevents generation of a star bug. 
Blocking deletion of aite 14 doesn't hurt because its bugs have alternative derivations. The point is 
this; If all ailc deletions beneath Borrov^-from arc banned, the theory's predictions are improved. 

The results of figure 10-2 clearly indicate that aile deletion should not apply to ailcs beneath 
goals, such as Borrow^from, that are disjunctive in nature. Only ailes beneath conjunctive goals 
should be subject to deletion. Although the results of figure 10-2 are, of course, sensitive to the 
particular structure used in the procedure of figure 10-h the restriction on aile deletion has been 
tested on Sierra with other procedures and found to hold up just as well. Although the restriction 
still allows some star bugs to be generated, it bloclcs the generation of many others. Hence, there is 
strong evidence that deletion should be constrained to delete only rules beneath goals that are 
coruunctive in nature. 

Conclusions 

Once again, the distinction between AND goals and OR goals has arisen from trying to fit aB 
operator around the empirical evidence. The deletion operator needs the And-Or distinction, just as 
the Backup repair did. It seems that nature is trying to tell us something. The Aad Or distinction 
seems a fundamentally important distinction, and as such should be given a clear expression in the 
representation language, rather than lurking in the rule's applicability conditions in the form of 
mutually exclusive predicates. On this basis, the And hypothesis will be eliminated from further 
consideration. Goals will have at least a binary type, and versus OR, and aile deletion will be 
limited to rules that a'^ocar beneath AND goak 

10.4 Satisfaction Conditions 

Conjunctive rule deletion generates all the deletion bugs and it avoids generating half of the 
star bugs of figure 10-2. However, it still allows the other half of the star bugs to be generated. 
These star bugs will be examined in detail in order to motivate a way of blocking their deletion. 

The nain loop of subtraction, which traverses columns, has the following goal structure when 
it is translated into the And-Or exit convention from the neutral representation of figure 10-1: 
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Goal: Multi (C) Type: AND 

3. (SubiColC) 

4. (Sub-rest {Ncxl-column C) 

Goal: Sub rest (C) Type: OR 

5. C is not the Icfimostcolumn =^ {MuIti C) 

6. C'sbouum is blank ^ (Show C) 

7. truc=:>(1.ifTC) 

The applicability conditions of the nilcs havc been adjusted. The conditions for die AND rules have 
been omitted. ^ITie conJitioiis for the OR goal. Sub-rest, ha\e been adjusted to reflect the foct tliat 
they are tested in order and only one is executed. For instance, if the column C is the leftmost 
column, then rule 5 will not b^ applicable, and control moves on to test rule 6. If that rule applies, 
the primitive Show ansvters uie column, then control returns to Sub-resL Since the goal is marked 
as an OR goal and one rule has been executed, no more rules are tested. In particulan the default 
njle. 7, wilf not be tested. Instead, the Sub-rest goal is popped. 

The Multi goal is an and goal, so either of its subgnals can be" deleted. Deleting rule 4 
creates ^ bug that only docs the units column. Intuitively* only doing one column would be the 
mark of a student who has not yet been tiught how to do multiple columns. Since doing multiple 
columns is always taught before borrowing, it would be highly unlikefy for a student to know all ' 
about borrowing and yet do only the units column* To put it more formally^ if all of BF2 were 
present when rule 4 is deleted* the procedure would generate a star bug: 

3 2 9 

Only-Do-Units: 3 4 6 3 4^5 3^0^7 

'10 2 - 12 9 - 16 9 

3 X 6 X 8 X 

If borrowing were not y<^^ learned and rule 4 were deleted, then reasonable bugs would be 
generated. For instance, one reasonable^ but as yet unobserved, buggy procedure docs only the 
units column but it simply takes the absolute difference there instead of borrowing. It would be the 
bug set {OnIy*Do-Units Smaller- From- Larger}. In short, there is nothing wrong with the deletion 
of rule 4 per se, but it can create a procedure that mixes competence with incompetence in an 
unlikely manner. 

Another star bug of figure 10-2 occurs when rule 13 is deleted from Borrow, given the 
following Version of column processing and borrowing: 

GoakSublCoKO Type: OR 

8. T<B in C=:> {Borrow C) 

9. the bottom of C is blank => {Show C) 

10. true=^{DirFQ 

Goal: Borrow (C) Type: and 

11. (BorrOw-fh)m (Next-column C)) 

12. {AddlO C) 

13. (DifFC) 

Deleting rule 13 generates a procedure that sets up to take the column difference after a borrow, 
but forgets to actually take it. This leads to the following star bug: 
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3 



2 9 



•BlanbWitl-j-Borrow: 



3 4 6 
"10 2 

2 4 3 V 



3 4^6 
-12 9 



3 0*7 
"16 9 



2 1 X 



1 3 X 



What makes ihis bug so un1i(tet> is thai ii leaves a blank in the answer despite the fact that it shows 
a sophisticated knowledge of borrowing. 

ft is possible to put explicit constraints on conjunctive rule deletion in order to block the 
deletions that generate the star bugs. However, there is another wa> to prevent o\ergeneration that 
will be shown to have some advantages. The basic idea is to make the operator inapplicable by 
changing the t>pes of the two goals in question so that they are not and goals. This would make 
the deletion operator inapplicable, lliat is, one changes the knowledge representation rather than 
the operator. 

The proposed change is to adopt a new goal t>pe. The hypothesis is to generali/.e the binary 
AND/OR type to become satisfaction conditions. The basic idea of an and goal is to pop when oil 
subgoats have been executed, while an OR goal pops when one subgoal has been executed. The 
idea of satisfaction conditions is to have a goal pop when lis satisfaaion condition is true. Subgoals 
of a 'goal are executed until either the goal's satisfaction condition becomes true, or all the 
applicable subgoals have been tried. (Note that this is not an iteration construct — an "until"* loop 
— since a rule can only be executed once.) a^d goals become goals with FALSE satisfaction 
conditions: Since subgoats are executed until the satisfaction condition becomes true (which it 
never does for the and) or all the subgoals have been tried, giving a goal Kalsl as its satisfaction 
condition means that it will always execute alt its subgoals. Conversely, OR goals are given the 
satisfaction condition true: The goal exits after just one subgoal is executed. 

With this construction in the knowledge representation language, one is free to represent 
boiTOwing in the following way: 



Goal: SublCol (C) Saiisiaction condition: Cs answer is non-blank: 

8. T<B in C (Borrow C) 

9. the bottom ofC is blank ^ (ShowC) 

10. true ^ (DifrCO 

Goal: Borrow (C) Satisfaction condition; false 
IL (Borrow-from (Next-column C)) 
12, (AddlO C) 



The and giMl Borrow, now consists of two subgoats. After they arc both executed, control returns 
to SublCol Because SublCol's satisfaction condition is not yet true — the column's answer is still 
blank — another subgoal is tried. Diffis chosen and executed, which fills in the column answer. 
Now the satisfaction condition is trtje, so the goal pops* 

Given this encoding of borrowir^g, the conjunctive rule deletion operator does exactly the 
right thing when applied to Borrow. In particular, since rule 13 is no longer present, it is no longer 
possible to generafe the star bug, *Btank-With*Borrow-From by deleting it. Rule 13 has been 
merged, so to speak, with rule 10. Since rule 10 is under a non AND goal, SublCol, it is protected 
from deletion* 

Similarly, the star bug associated with column traversal can be avoided by restructuring the 
loop across columns. The two goalS^ Multi and Sub-rest, are replaced by a single goal: 
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Goal: SubAll (C) Salisfaciion condition: C is the leftmost column. 

5. inic=> (SubtC'JlC) 

6. true ^ (SubAll (Ncxi^column C)) 

111/ goal first processes the given column by calling the main cqlumn prtK:ssing goal. SublCol 
lliCT It checks the satisfaction condition. If the column is the piublcm's leftmost column, th, goa^ 
pups. Otherwise, it calls itself rccursi\el>. liy using a satisfaction cunditiun formulation, generation 
of the star bug is d\oided. l>ie and goat. Multi* has been elnninatcd along willi its rule 3* the rule 
whose deletion caused the star bug* 

I>iesc two illustrations indicate tha; augmenting the representation with satisfaction conditions 
creates an empirically adequate treatment of deletion. Satisfaction conditions ^^e^e used for se\er3l 
years in Sierra (Brown & VanLehn. 1980; VanLehn, 1983). However, as the fij^i versions of Sierra*s 
learner were implemented* a fatal flaw was discovered. !n essence, if the . arner was constructed so 
that It would put satisfaction .condiuons on SublCol and SubAll it would also put satisfaction 
conditions on other goals, which, unfortunately, would block the generation of certain deletion bugs. 
This hurts the empirical adequac> of the theory. Various ad hoc constraints can be imposed to 
*Tix" the flaw, but they. In turn, cause ftjrther empirical diRiculties. The whole story is quite 
complex, so it has been relegated to appendix 10. 

I own there, a simpler solution is to abandon satisfaction conditions entirely. Instead, an 
expbct constraint is added to the deletion operator; It may only delete rules from ihe most recently 
acqvtred AHD goal. To see how this works* consider the star bug mentioned earlier. Only Do-Units. 
This bug IS an unreasonable prediction precisely because it exhibits perfect knowledge of borrowing* 
but has, apparently, "forgotten" that all columns need answering. To generate ih'S star bug, a 
certain rule of the goal Mulli (see above) must be deleted. It cannot be deleted after borrowing- 
from-zero is learned, because that is precisely what the new constraint blocks, .^ul th* deletion 
won't survive if it occurs before borrow from -zero is learned* To see this* suppofe it were delete/! 
before then, The learner would try to me the damaged proccG^iie to parse the Mnrow fron.-zero 
examples. Even the simplest borrow -from-zero examples have at least two columns that require 
answers. Because of the rule deletion, the learner ts unable to match the actions of the examples 
that answer the non- units columns. Of course, the learner is also unable to match the new borrow- 
from zero subprocedure, which is the topic of the lesson. To assimilate the example, the learner 
would have to adjoin two disjuncts to the procedure — one to handle the extra answer actions, and 
one to handle the new subprocedure. Adding two disjuncts is prohibited by one- disjunct-per- lesson. 
In short, the dam^ec! pnxedurc cannot be augmented with the bonrow-from zero S'jbprocedure. 
The case presented in this illustration is a typical one. Rule deletions that occur early in the lesson 
sequence generally will not survive long* A damaged procedure will be asked to take a lesson tJiat 
assumes it has a subprocedure that it doesn't have, and nonnai, one disjunct-per lesson 'earning v'ill 
not let this procedure pass. (In real classrooms, what probably happens is that students who have 
such damaged core procedures are discovered and remediated.) Hence, the only way to 
inappropriately juxtapose incompetency with competency is to delete rules from subprocedure well 
4Ker they are acquired. This way of generating star bugs is exactly what the new constraint blocks. 
Only the AND rules of newly acquired siibprocedures may be deleted. 

IQS Summary, formal hypotheses and conflict resolution 

The goal type issue has proved to be a subtle one* The alternative exit conventions arc fahiy 
clear cut: 
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1. and: a goal is popped when all subgoals have been tried. 

1 and/Or: Goals have a binary type. If ihe goal's type is and, all applicable subgoals are 
executed before the goal is popped If it is Or, ihe goal pops as soon as one subgoal is 



3. Satisfaction conditions. Goals have a condition which is tested after each subgoal is executed. 
If tlie condition is true, the goal i^i popped. Metaphorically speaking, the gt.al keeps trying 
different subgoals until it is satisfied. 

Howe\er. the ai^guments between them were weak. What evidence there is indicates that the And- 
Or exit convention is the best A brief review of the arguments follows* 

There were three argu -*;ts against the Anv h ^ othcsis. Two were based on parsimony, fn 
order to hd\e a simple expression of the Backup repair and conjunctive nile deletion, goals should 
be marked with a bindr> type to differentiate and goals from Or goals* Under the And hypothesis, 
it is still possible to express the operators* but they must analyze the applicability conditions on 
rules in order to distinguish and goals from OR goals. The third argument against the And 
hypothesis relies on intuition. Intuition, but little else, suppons a conjectured felicity condition 
called the assnwlauon conjecture. It states that new knowledge structures can be acquired without 
changing the old ones, except in certain narrowl> prescribed ways. Since the And hypothesis uses 
mutually exclusi\e applicability conditions to express disjunctive goals, it forces old applicability 
conditions to be adjusted ^vhen a new one is added. The And hypothesis forces learning to violate 
the assimilation conjeaure. 

Satisfaction conditions are a generalization of the And-Or convention* Under the And-Or 
convention, a goal pops after either one or all of its subgoals are executed. With satisfaction 
conditions, a goal may pop after any number of its subgoals have been executed, where the number 
of Subgoals executed is governed by a condition. The extra degrees of freedom in expressive power 
can be used to control tti, predictions. Indeed, an attempt to do ihis was made by using sadsfaction 
conditions to control the deletion operator, ff satisfaction conditions were based on just the right 
goals, the thtory avoided generating several pesky star bugs. Howe\er, it turned out to be difficult 
to account for the acquisition of satisfacti^^n conditions on just ihcse goals and not the others. The 
saii^raction conditions approach is weakened because learning cannot explain their existence. This 
leaves the And-Or hypothesis the only one standing. 

Formal hypotheses 

llie main results of this chapter are summarized in the following three hypotheses: 
And-Or 

Goals bear a binary type. If G is the current goal in runtime state S> then 
(ExitGoal? S) is true if 

1. G is an AND goal and all its rules have been executed, or 

Z G is an OR goal and at least one of its rules have been executed. 

AND rule deletion 

(Delete p) returns a set of procedures P'such thatP' is P with one or more and rules deleted. 
Most recent rule deletion 

(Delete P) returns a set ofproceduies P'such that P' is P with one or uiorc rules deleted from 
the most recently acquired subprocedure* 
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These hypotheses define two previously undefined functions. Delete and ExitGoal?. ITie latter 
^^t^s used m the definition of the interpreter In section 9.4. It controls whether a goal Is popped, 
given that the stack has just popped back to It. 

Conflict resolution 

"ITic other undefined function used in llie Interpreter is PickRule. Its definition is simple to 
motivate. Hov^ever it depends on the And-Or distinction, which Is why |[s definition has been 
deferred until now. 

The ftjnction PickRule decides which of the current goaVs rules the Interpreter should run 
next. This choice is governed b> conventions that are called conflict resolution strategies la the 
pruducDun system literature (McDermott & Forgy> 1978). Two convcnuoas are needed just to get 
the Interpreter to work at all: 

1. Ifa rule has already been executed for this instance of the goal, then It may not be executed again. 

2. Ifthe applicability condition of a rule Is false> It may not be run. 

"ITiese two conventions were discussed In section 9.4. However, they do not always settle the 
question of which rule to pick. There are often several unexecuted^^appllcable rules for the current 
goal. More conventions are needed. It Is convenient to discuss conventions for AND goals 
separately from the conventions for OR goals 

AND goal conflict resolution 

The problem with and rules is expressing their sequential order. One way to get AND rules 
to run in sequence is to use the applicability conditions. In a production system, one can force 
rules to be executed In sequence by having each rule add a token to the working memory 
(execution state) that will trigger its successor rule and only Its suaessor rule. This will not work 
here because the only internal state Is the stack and the microstate bit. There Is no buffer to add 
tokens to. On the other hand^ the applicability condition could sense the external state (l.e.> what 
the problem looks like). This would suffice for sequential ordering In many cases. However, it 
would interact with rule deletion in such a way that several of the deletion bugs could not be 
generated. 1 won't go through the details here. The point Is that applicability conditions cannot be 
used to express the sequential order of and rules. Some explicit convention Is needed. About the 
simplest one possible Is to represent AND rules In an ordered list, and to execute the rules in the 
order that they appear In the and goal's list, li Is the one that will be adopted. 

OR goal conflict resolution strategies 

The learner must induce the applicability conditions of OR rules. In particular, a new 
subprix^cdure wjII have a new OR rule that calls It. lliis OR rule Is called the adjoining rule because 
It attaches the new subprocedure to the exiting procedural structure. In order to induce the 
applicability condition of the adjoining rule, it Is extremely helpful for the learner to be presented 
with negative instances of it A negative Instance of an applicability condition is a problem state 
where the applicability condition Is false. Teachers and textbooks do not usually show negative 
instances to the students explicitly (l.e-, in discrimination examples). However, the learner can 
recover negative instances from the kinds of examples they do get, under certain circumstances. 
Suppose the new subprocedure is a sister of some goal G, 'ITiat Is, G and the new subprocedure 
are both subgoals of the same or goal. Given a worked example, the learner can discover whether 



163 



166 



GOAL Types 



G has been executed and if so, what ihe problem state was at the time is ^vas invoked. Call this 
problem state S. Since G and the new subprocedure are OR-sistcrs, the nc^v subprocedure could 
have been picked to run at S. but the teacher did not do so. Because the new subprcN:cdure was 
not run at S. its adjoining rule's applicability condition must have been false. Therefore, S is a 
negative instance for iL S is Just what induction needs. 

Ho^vever. there is a subtle flaw in the inference just given. The choice of rule depends both 
on ctpplicdbilit) conditionb and on conflict resolulion strategies. 'ITie learner cannot be sure that the 
adjumjng rule's applicability condition is false at S unless the learner knov^s that the adjoining rule 
v^ould be fa\ured over the others in cases where they were both true. The only wa> to guarantee 
this is to adopt a conflict resolution strategy th^ is based on the time of acquisition. Tliat is, when 
two rules are both applicable, the rule that was learned most rccentl> is chosen. This convention, 
called "recent1> in long-temi memory"* when used in production systems, guarantees that the 
adjoining rule would have been chosen if it had been applicable: Since it wasn't chosen, it must 
not have been applicable. Therefore S is a valid negative instance for the induction of the adjoining 
rule's applicability condition. 

In short there is really no choice about conflict resolution strategies, given that goals are 
typed. The conflict resolution strategies can be summed up with the following hypothesis: 

Conflict resolution 

L A rule may be executed only if its applicability condition is true and it has not yet 
been executed for this instance of the current goal. 

2. In the representation of the procedures, the rules of AND goals are linearly ordered. 
If the current goal is an and goal, and there are several unexecuted, applicable 
rules, then choose the first one in the goaFs order. 

3. If the current goal isanof^ goal, and there are several unexecuted, applicable rules, 
then choose the rule that was acquired most recently. 

Causes 2 and ? imply that and goals and oR goals have similar syntax. The> both have ordered 
lists of rules. Moreover, the interpreter always picks the first unexecuted rule, for AND goals, and 
the first unexecuted rule whose applicability condition is true, for oR go^ls- ITie m^or dirferenee is 
what happens after the respective rules arc executed. OR goals immediately pop, AND goals do nOL 
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This chapter discusses how the procedare represcnt^tit^n language sliould represent data flow. 
In genenil data flow in a proceduf^ js the set of incchaiiisms and conventions for tJie storage and 
trarsmission of data, such as numbers or other sjmbols. (N.B,, I am using "data flow" in the sense 
of Rich and Shrnbe (1976) rather than the more common usage of Dennis (1974) and other authors 
who discuss data flow languages,) )n the kinds of procedures that concern this thcur>, the data flow 
issue hinges mostl> on how the language should represent focus of visual attention. As students 
soKc subtraction problems, the> seem to focus their visual attention on \arious digits or columns of 
digits at vanous times. ITiis can be inferred from eye tracking studies (Newell & Simon, 1972; 
Buswell 1926), It can also be infcrred from the information that students read firoin the paper. 
ITie issue this chapter discusses is how to represent the fact that focus of attention is held 
unchanged for penpds of time as well as being shifted. To put it in slightly inaccurate terms, the 
issues concern the short-term storage of visual focus. Four positions will be contrasted; 

L No dataflow: Itic hypothesis is that procedures do not use data flow. In particular, there is 
no internal storage of fociiS of attention. Instead, the places where the procedure reads and 
writes on the page are described by static descriptions. For instar.ee instead of describing the 
place to write an answer digit as "the answer position in the current column." it would be 
described as "just to the left of the leftmost digit in the answer row," 'ITiis description docs 
not use the notion of a current focus of attention. It describes locations statically. 

2. Globally bound data flow (regisiers): A leading contender for storing and transmitting fbcus of 
attention is to use registers that store either the position on the paper that is currently being 
examined, or a focus of attention that was once current and is being saved for some reason. 
By "register/' 1 will mean a globally bound data storage resource. The current contents of a 
globally bound register is determined solely b> chronology. Its contents is the value most 
recently set into the register. 

3. Locally bound data flow (schenia- instance): To describe this hypothesis, the notion of an 
instantiation of a goal is needed. When a goal is called, it is pushed onto the goal stack along 
with a little extra informauon. This extra information is labelled an instantiation of the goal 
If a goal is called recursively, there will be two instantiations of it on the stack. The basic 
idea of the schema-instance data flow hypothesis is that data can be stored with each 
instanuation of a goal. The goal is viewed as a schema with certain open parameters, called 
argument;;. Instantiating a goal fills in the values of thc^e open parameters, an operation 
called binding the arguments. This way of stiucturing data flow is called local binding in 
recursive programming languages, such as Lisp. In object-oriented languages, such ^vs 
Smalltalk, the same idea is used a little differently. In order to include object-oriented 
languages in the purview of this hypothesis, the hypothesis' name is schema- instance instead 
of the less general name, local binding* 

4. Applicative dataflow: The schema-instance data flow hypothesis allows different instantiations 
of a goal to have different foci of attention stored with them. However, it is possible for an 
instantiation of a goal to change its stored fbcus and even to (hange the stored foci of other 
goals' instantiations. The applicative data flow hypothesis outlaws such changes. Once an 
instantiation's arguments are bound, they can never be changed. Applicative data flow is used 
by applicative languages, such as pure Prolog or the lambda calculus. 
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The basic issue behind these various options concerns how independent data flow should be from 
control flow. ITie discussion starts with the weakest, most inflexible hypothesis: The no-data-flow 
hypothesis is that data flow is congruent with the static structure of the procedure. Thus, focus of 
attention doesn't depend on which instantiation of the goal is executing, but only on the name of 
the goal that is executing. This turns out to be too inflexible. It can't express certain bugs. 

ITie next strongest hypothesis is the applicative hypothesis, ft assumes that data flow is 
congruent with tlie dynamic structure of the procedure, ff there is a stack* the fbcus of attention 
changes when* and only when, the stack is pushed or popped. This turns out to be the best of the 
four alternative hypotheses. 

The global binding hypothesis allows data flow and control flow to be totally independent 
The procedure is allowed to change the Kkus of a, ,ntion without changing the flow of control, and 
vice versa. It is shewn that this independence Lads to problems in the theory* In particular, 
certain shifts in control necessitate a shift iti data flow. In such cases, mainly concerning popping 
the stack, the independence of control and data flow must be curtailed: data flow must parallel 
control flow then. Such cases motivates the schema^instance hypothesis. 

The schema-instance hypothesis is halfway between the total independence of control flow 
and data flo^^, and the total isomorphism of the two that is stipulated by the applicative hypothesis. 
Under the schema^nstance hypothesis, when control flow changes, data flow changes. If there is a 
stack, fbcus automatically shifts when the stack changes because the top of the stack is what holds 
the current focus of attention* However, data flow may also be changed when control flow does not 
change. This extra degree of freedom is never used in any of the procedures implicated by the 
data. To explain this, a constraint is added: when the stack does not change, fbcus of attention does 
not change. The result is the applicative data flow hypothesis, that data flow can change when, 
and only when^ control flow changes, 

IM The hypothesis that there is no data flow 

It may be that there is no need to have an explicit representation for data flow. Maybe it will 
suffice just to have a push-dawn automaton^ not an atn with its registers. This would put a burden 
on the procedure's interface with tlie problem state. In order to traverse columns in subtraction, 
instead of passing the current column in a register, the patterns (or whatever implements the 
interface) would have to descnbe the focus of attention as the rightmost unanswered column. Since 
there is a visual marker for where fbcus of attention needs to be, namely the boundary between 
an<iwered and unanswered columns, this technique will succeed. 

However, there are bugs which leave answers blank. These cannot be represented by using 
the boundary between answered and unanswered columns. For inhu t^- > one observed bug skips 
columns which require borrowing: 



The derivation of this bug assumes that the student hasn't learned how to borrow yet* When the 
student attempts to take a larger number from a smaller one, an impasse occurs. ITie repair to this 
impasse is the Noop repair, ft causes the column difference action to be skipped. If the procedure 
is using the boundary between answered and unanswered columns to determine the focus of 
attention* then after the Noop repair* the procedure will return to focus on the column that it just 
finished. It ^^on't shift its attention to the next column left, as the bug doe^. Instead* the procedure 
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^ill gu inio an mfmitc loop examining the same Column over and over again.^ This is clearly a star 
bug. Not onl> duos the no-dalo-flow hypothesis prevent the generation of an observed bug, it 
causes a star bug to be generated. 

Moving beyond the subtraction skill one finds th^t there are procedures that clearly need 
some kind of "current focus pomlcr" in order lo traverse lists without the aid of visual markers. 
Fur example, children can add l*>ng columns of digits. 'ITiis seems to require some kind of register 
ur <x counter or somctlimg that indexes down the digits in a column. To represent the traversal with 
d pure puih dovvn automaton, one would have to have distinct states for each digit in order to have 
dibtinLt patterns to fetch that digit If the push-down automaton were finite, then there woifld be 
Some finite limit un the number of digits ihe student could add. This seems totally unlike human 
mathematical skill 

llicre IS another ai^ument against th^. position that procedures do not maintain some kind of 
current focus pointer. It is a reductio ad abzurdum argument. Consider taking a subtraction test 
Under the nu data-flow hypothesis, the patterns in the procedure arc used to distinguish the column 
being ^^orkcu on from the others by taking advantage of the fact that the columns to the right are 
anb^ered. However, something must also specify the subtracfion problem being worked on. To do 
so. the procedure's patterns might use the fact that the exercise problem is the one that has only 
suhed exercises before it and unsolved exercises after it Going one step further^ the pattcrj^s must 
&pccif> wh'ch piece of paper is the test paper. Qearly, the patterns are being burdened with quite a 
bit of description, llie no-data-flow hypothesis entails that patterns mention things that are 
irrelevant to subtraction. It makes silly predictions. It might predict that a student would believe 
that a subtraction problem can only be done on a chalkboard or in a textbook, since that is the only 
place the student sees examples being done. It seems that there has to be some cunenl focus 
pointer somewhere in order to have the procedure retain any degree of modularity at all. 

11.2 Focus is not globally bound 

The previous section showed that the procedure is somehow storing and maintaining a cunent 
focus of attention. This section compares two ways to do this: globally bound variables (registers) 
and locally bound variables, fn the interest of factoring the hypotheses of the theory as 
mdependently ab possible, it will not be assumed that the control structure is recursive. This makes 
the nomenclature more awkward, but gives the resulting conclusions a little more generality* One 
other assumpiion ib tjeeded before the main argument can be presented. It will be assumed that the 
Backup repair exists. The defense of this assumption is in section 1 of appendix 9. Some important 
features of this repair are most easily described with an example of its operation. 

Figure U-1 gives an idcaliiied protocol It illustrates a modct^ately common bug (Smaller* 
From- Larger- 1 nstead-of- Borrow -From-Zero). In the Southbay sample of 375 students with bugs, 
Hve students had this bug. The (idealized) subject of figure 11-1 does not know all of the 
subtraction procedure. In particular, he does not know about borrowing from zero. When he 
tackles the problem 305-167. he begins by invoking a SubiCol goal. Since 5 is less than 7, he 
invokes a Borrow subgoal (episode a in the figure), and immediately the first of borrowing's two 
bubguah, namely Borrow ing-from (episode b). At this point, he gets stuck since the digit to be 
borrowed from is a zero, which cannot be decremented in the natural number system. He is at an 
impasse. Sc\cra] repairs can be used at impasses to get unstuck. The one that interests us here is 
called the Backup repair, ft gets past the decrement-zero impasse by "backing up/' in the problem 
solving sense, to the last goal which has some open ^tlicrnatives* In this case there arc five active 
goals: 
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a, 305 In the Units column, I can't take 7 from 5, so Til 
- 167 have to borrow. 



b. 305 To borrow, I first have to decrement the next 
- 167 column's top digit But I car/t take 1 from 0! 



c. 305 So I'll go back to doing the units column. I still can t 
- 167 take 7 from 5, so Til take 5 from 7 instead. 



2 

d. S?05 In the tens column. I can't take 6 from 0, so Til have to borrow. 
- 167 I decrement 3 to 2 and add 10 to 0. That's no problem. 



2 

e. ^05 Six from 10 is 4. That finishes the tens. The hundreds is 

- 167 easy, there's no need to borrow, and 1 from 2 is 1. 

142 



Figure IM 

Pseudo^proiocol of a student perfonning the bug 
SmaUer-From-Largcr-Instcad-of-Borrow-From-2^0. 
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1. Borrow- from: a goal that normally just decrements a dig't 

2. Ilorrow: a goal thatprocesscscolumnsthot require borrowing 

3. SublCol: the main column processing goal 

4. Multi: a goal that traverses across multiple columns 

5. Sub: the top-level goal for solving a subtraction problem 

"llie fJurrowfrom goal has failed. The IJorrow goal has no alternatives: ()ne always borrows-from 
then borrows-mto llie next most distant goal, SublCol. has alternatives: one altornjlivc for 
columns that need a borrow, and one for columns that do not need a borrow. Since SublCol has 
open alternatives, Backup returns control to it. The evidence for backing up occurs in episode c 
where the subject says "So 111 go back to doing the units column.** In the units column he hits a 
second impaiSe. saying "1 siill ean*t take 7 from 5," which he repairs {**so 111 take 5 from 7 
instead"). He finishes up the rest of the problem without difficulty. 

The crucial feature of the analysis above, for this argument, is that Backup caused a transition 
from a goal (Borrow-from) located at the top digit in the tens column to a goal (SublCDl) located at 
the units column. Backup caused a shift in the focus of attention from one location to another. 
Moreover, it happens that the location it shifted back to was the one that the SublCol go^l was 
ongfnally jnstantjaied on, even though that column turned out to cause problems in that liirther 
processing of Jt led to a second impasse.. So, it seems no accident that Backup shifted the location 
back to SublCoFs original site of invocation. Backup shiAs both focus and control. 

Incidentally, I expect this focuS-shifUng property of Backup to remain uncontradicted by 
evidence from other domains. In Newell and Simon's study of eye movements during the solution 
of cryptanthmetjc puzzles, for example, there is ample evidence that backing up (popping a goal in 
their s>siem) restores not only the goal, but the focus of visual attention that was current when the 
goal was last active (Newell & Simon, 1972, pp. 323-325). 

With the empirical evidence on the table, the basic argument is simple to stale: If focus is 
bound to instantiations of goals, then backing up to a goal automatically restores focus of attention. 
The schema^mstance hypothesis captures the facts quite nicely. The global binding hypothesis runs 
into trouble. If focus is globally bound (e.g., in a register), then the Backup repair would have to 
be formulated so that it explicitly resets fccus as it sends control back to a goal But how would 
Backup know what to reset the focus register to? By hypothesis, th^* only "memory** for focus is the 
focus register. Hence, Backup would have to (1) analyze the proc^ Jre*s structure to figure out how 
the current focus was calculated, then (2) run these calculations oackwards in order to obtain the 
value that is to be set into the focus register. Dearly* this makes Backup a very powerful repair. 
Not only can it do static* analysis of control structure, but it can simulate a procedure running 
backwards! It is much more powerful than the other repairs, which do simple things like skipping a 
stuck action. Backup is so powerful that it can potentially model any conceivable student behavior. 
This would makes the theory irrefutable* not to mention implausible. In short, if focus is locally 
bound* Backup is simple; If focus is globally bound. Backup is powerful and implausible* 

There are various ways that the global binding hypothesis can be patched up. One can 
provide multiple focus registers* for instance. As it turns out. there are excellent arguments against 
such augmentations to the global binding hypothesis. The ai^uments are rather complex, although 
quite elegant at times. They have been relegated to an appendix (see section 3 of appendix 9). At 
any rale, none of the versions of the global binding hypothesis have the empirical and explanatory 
adequacy that the schema-instance hypothesis has. So the global binding hypothesis will be 
rejected 
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What ihis means is that rcprcScntationG ihat do not employ ihe schemata and instances^ such 
as finite btutc machines or flo.* charts with registers, can be dropped from consideration. This puts 
us, roughb speaking, on the famihar ground of "modern" representation languages for procedures, 
such ^s stiick'bascd langu^iges* certain varieties of production systems, certain message passing 
languages, and so on. 

1L3 The applicative hypothesis 

ITie deletion operator (chapter 7)> regardless of how it is fonnali7,cd exactly, is a valuable tool 
for examining data flow* Having an operator that mutates the knowledge representation allows one 
to infer the structure of the representation. An important use of this tool is to uncover one of the 
tacit constraints on data flow. 

A prominent fiu:t about bugs is that none of them require deletion of Kkus shifting functions. 
For example, if one knows about borrowing- from, one knows to borrow from a column to the left 
No bug has been observed that forgets to move over before borrowing- from* This fact deserves 
explanation. 

In all the illustrations so fan focus shifting has been eitibedded in rule actions {right hand 
sides). This is no accident Suppose one did not embed them, but made them separate actions, as 
in 

Goal: Borrow {C) Type: AND 
L (Borrow-into C) 
Z (C ^ {Next-column C)) 
3, { Borrow- from C) 

The represents a variable setting operation {i,e„ a SETQ), A star bug is generated by deleting 
rule 2, This star bug woi'ld borrow from the column that originates the Ijorrow; 

14 9 16 

'Borrow^From-Self; 3 4 5 3 4^5 2^0^7 

-10 2 "129 -16 9 

243V 225X 137X 

fn order to avoid such star bugs, focus shifting functions must be embedded, A constraint upon the 
knowledge representation is needed. About the strongest constraint oi^e can impose is to stipulate 
that the language be applicative. That is. data flows by binding variables rather than by assignment 
There are no side effects, a goal cannot change the valueS^^f another goal's variables, nor even it's 
own variables. The only way that information can flow "sideways" is by making observable changes 
to the externa! state, that is, by writing on the exercise problem. 

The applicative hypothesis is extremely strong, forcing data to flow only vertically. The 
procedure can pass information down from goal to subgoal through binding the subgoal's 
arguments. Information flows upward from subgoal to goal by returning results. No 
counterexamples to the applicative hypothesis have been found so far, 

AppltQattve data flow enables context-free subprocedure acquisition 

The applicative hypothesis has a profound effect on learning. It makes learning 
subprocedures context free. That is, learning a hierarchical procedure becomes roughly equivalent 
to inducing a cOLtext free grammar. The basic idea is that the applicative hypothesis, together with 
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the raun>ivc control hypotlicsis (chapter 9), force data and control flow to exactly parallel caeh 
other To put it in terms of grammars, the data flov^ subcategorizes the goals. This, in tum> makes 
it much easier to induee the goal hierarchy from cxam^jlcs. 

Inducing a procedure's cnlting hierarchy (i,e,. goal-subgoal hierarchy) from examples has 
proved to be <t tuugh problem in Al, Neves (19S1) used hierarchical examples to get his procedure 
iearncr to build hicrarcliy. However, subtraction teachers do not always use such examples. Badre 
(1972) recovers hierarchy b> assuming examples arc accompanied b> a written commentary. Bach 
instance of the same goal is assumed to be aecorrjpanied by the same verb (e-g., "borrow"). This is 
d Somewhat better approximation to the kind of input that students actually receive, but again it 
rests on delicate and often violated assumptions, Anzai (1979) uses various kinds of production 
compounding (Lhonkmg) to build hierarchy. However, to account for which of many hierarchies 
would be learned. Anm used dumain spccific features, such as the pyramids characteristic of 
subgoal states in the Tower of Hanoi pu*:zle, "Ilie applicative hypothesis eracks the problem by 
structuring the language in such a way that hierarch> can be learned via a context" free grammar 
induction algorithm. 

11.4 Summary and formaliiation 

The arguments in this chapter have been somewhat complicated although the eonclusion 
reached is a rather simple one. First, it was shown thai procedures need to maintain some notion of 
a current focus of attention. Roughly speaking* the fbcu:. of attention is a pointer to a region of the 
current problem state where some reading or writing actions are 3oing on. To maintain the current 
focus of attention, some wa> to store and transmit focus over time is needed. This fecility was 
labelled ''data flow." Various ways to construct it were contrasted. 

The simplest facility is based on using registers (globally bound variables) as repositories for 
the current fbcus of attention. This allows control flow and data flow to be completely 
independent. However, this independence led to the downfall of this approach. The Backup repair 
Can be assumed to be a minimall> simple way to change control, yet empirical evidence shows that 
whenever it shifts control, it also shifts fbcus of attention in certain ways. If control How and data 
flow are as independent as the register hypothesis has them, then there is no way to explain 
Backup "s tandem shifts of control and focus. If the two flows are independent, why doesn^t Backup 
shift just One and not the other? 

The schema- instance hypothesis revises the register hypothesis in a straightforward way by 
stipulating that fbcus is somehow stored in close association with the instantiations of goals. Hence, 
whenever a goal is resumed* as b> the Backup repair, then its stoix;d focus of attention becomes 
current. In a sense, this hypothesis is a dfirect response to the difficulties of the register hypothesis. 
It stipulates that whenever control pops, fbcus of attention is restored too. 

The applicative hypothesis goes one step (ijrther. It stipulates the converse: whenever control 
does not popl(or push), then focus of <*itention does not change. That is, the only way to change 
focus IS to push or pop goal instantiations. There is no way to change an instantiation's stored focus 
Once that instanuation has been made. The curr^^ntly executing instantiation can't even change its 
Own focus. This extremely strong hypothesis is motivated by an apparent lack of certain kinds of 
bugs. If focus could be changed without changing control, then it ought to be possible tor students 
to forget to do such change. That is. the nile describing the change could be deleted. Yet no 
such deletions have been found. Indeed, when such deletions are carried out, they result in star 
bugs. Henctj, to explam the way deletion appears to work, data flow must be applicative. This 
conclusion is captured in the following hypothesis: 
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Applicative data flow 

Data flow is applicative. The data flow (ftxrusof attention) of a procedurechangesifand 
only ir the control flow also changes. When control resumes an instantiation oFgoal, the 
Tocus or attention that was eurrent when the goal was instantiated beeomes theeurrent 
Tocusorattention. 

The impact of ihis hypothesis on the emerging rcpresentatirn of proccJures is fairly simple. Goals 
are equipped with argument^t. An argument is a local variable that can be used in the rules that 
define the goal s subgoals. When a rule calls a goal, it provides values for each of the arguments of 
the goal that it is calling* For instance, in the goal Borrow: 

Goal: Borrow (C) Type: and 

1. (Borrow-intoC) 

2. (Borrow-from(Next*eoltimn Q) 

3. (DifTC) 

the argument of Borrow is C- When rule 2 calls the goal Borro\^*from, it provides a binding for 
Borrow*from*s argument by evaluating the focus shifting ftjnction Next-column. (The next chapter 
shows that ftjnctions are not a good ^^ay to represent the shifting of foeus; patterns are better.) The 
other rules simply pass the current focus of attention, held in to the goals that they call. 

Passing intension versus passing extensions 

Computer science has invented several ways to pass arguments. The most common is call-by- 
value. Others are call-by-name, call-for-rcsult* call -by- reference and la;.y evaluation, The basic 
dimension of variation is how mueh of the ''meaning" is passed along with the valu^ or datum. 
From a logician's viewpoint, the issue is whet\er the objects being passed via arguments are 
intensions or extensions. The most common convention is to pass extensions — call-by-value. Lazy 
evaluati n is perhaps the closest approximation in computer programming to passing intensions. 

The intension -extension dimension is a valid issue for the theory to examine. Certainly the 
theory has to take a stand on it if its model is going to be implemented on a ccHPputer. The issue 
is essentially whether focus of attention is a specific geographic region in the problem state or a 
description of a region in the problem state. One might represent an exlensional focus of attention 
by a rectangle in Cartesian coordinates. Intensional fbcus migh. be represented by a concatenauon 
of all the patterns that have been used to generate it Unforturately, the issue of extensional versus 
intensional focus is a very difHcult one, with some empirical evidence on both ^es. The position 
taken by the theory is to represent focus using parse nodes (see section 2.4), which are halfwaj 
between extensions and intensions. Appendix S discusses this complex issue, 
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Chapter 1 2 
Searching the Current Problem State 



In the next chapter, the problem of how to represent student's; understanding of the problem 
states will be discussed. It will be shown that students impose a structure on their view of the 
current problem state, llieir understanding is a sort of a task-specific ontology that dciermines 
whdt objects "exist*' in the sense that they are relevant to the task. The lask specific ontology also 
determines th? spatial relationships that "exist" between ih'cse objects. Regardless of how the 
students struc^e their *:ew of the problem state, the students must occasionally search that 
structure in order to locate information needed during problem solvmg. This chapte- discusses the 
search issue, which cLits across all the various task-specific ontologies ar^^ representations thereof. 

The previous chapter showed that procedures maintain a visual focus of attention, ft also 
discussed a certain kind of fyous shifting caused by popping goals. However, there are other kinds 
of focus shifting. For i.istance, the studijjits write s>mbols :n various lucattons, presumably shiftint^ 
focus between each writing action. Sufch focus shifting requires a procedure- directed movement 
through the visuahmantpulative spxe. That is, the procedure must search. 

The search problem is to equip ttiv procedure with facilities that atlow it to searcli the problem 
stHte {or rather the student's structured version of the problem stato). The essence of the search 
problem is how much of the sea;ch task to represent explicitly in the procedure, and how much to 
represent below the grain size of the representation as some kind of primitive or underlying facility* 
rhat IS, the search problem eoneems where to place the boundary between the cognitive skill under 
study, mathematical problem solving* and the perceptual and motor skills that necessarily 
accompany the exercise of maAemar*^' ''Jll Three hypotheses will be considered: 

1- Search loops: The procedure employs explicit search foops in order access and manipulate 
the symbols in the problem state. 

2. Path expressions: The procedure describes a path from the current focus of attention to tlie 
desired objeCL A mechanism that is beneath the giain size boundary actually moves the focus 
of attention along the path in order to access the desired objec*. 

3- Pattern matching: The procedure doesn^t need to express anything about how to find a 
desired symbol. Instead, it merely describes what it warns, llie description is called a 
pattern. A mechanism called the pattem matchei, which is below the grain size boundary, 
takes care of actually finding the describ'xl symbols* 

It will be shown that the evidence is clearly on the si'le of the third hypothesis. 
12.1 Srareh loops 

One way to find symbols is for the procedure to contain search loop., A search loop moves 
the focus of visual attention across the problem state, stopping when it reaches the location wh<;re 
the procedure 'vill read or wiite a symbol For instance, tn find the leftmost colufnn, thf* prQceiiure 
would loop leftward across symbols until it fines a coluxan {-^ ertical group of digit symbols) th^t 
has lots of blank space to the left of it 
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To implement a loop requires at least one disjunction; the conditional that says whether to 
stop k)r to go uncc more around the loop. According to the one-disjuact"per*1esson h>pothcsis> each 
such dtsjuncttun must be the topic of i[s own lesson. In \iself, thjs is not bad It could well be that 
Search 1o<)ps are taught in a series of lessons. 

Consider a simple search .hat walks down a string of algebraic symbols looking for> say> a 
\ariablc, ITiJS wuuld require a loop (expressed recdrsi\el>) across the symbols of the string. The 
loop would be similar in form to the one used in subtraction to walk across the columns of a 
prublcrn. Civcn tjnc djsjunct*per*lesson learning, the acquisition of the search loop v^ould mimic 
acquts.tion 'jf the mDlti-column loop; 

1, ITic first lesson concerns the simplest case, where the desired variable is the first element of 
the string. 

2 ITie second lesson has the variable as the second element of the example string, indico'ing that 
it is not the string s initial element but a variable that is being sought 

3. "ITie third lesson closes ihe loop with examples where the variable is anywhere in the string. 

One could imagine a panicularly thorough algebra teacher following this curriculum once. 
However the representation forces such a threeiessi^n unit to be presented for.^acn new search! 
Clearly, students can learn searches without this kind of teaching. 

The crucial difference between the multi-column loop and the search loop is exactly the 
nuki column loop requires mutation of the problem state at each step and therefore is not reaUy a 
search loop To maintain ^ e prediction that the multi-column loop requires several lessons, but the 
search loop docs not, a represenutional construction is needed that aisunguishes the two. The 
representation needs a special construction to perform searches, 

tl2 Pattern matehing ^ 

In a production system or a Planner*like language, the usual \/ay to access an object is to 
specify the relations that would be true of it, A typical description might be 

(AND 

{?X ISA PLACE) 

(?X IN ICOL) 

(NOT (?X IS/8LANK))) 

This description is used to find a ,ion-blank place in the given column. Traditionally, prefixes are 
used to distinguish search variables such ^ ?X from goal ar^gumePts, such as ICOL, which are 
bound prior lo the search. The critical point is that patterns need not specify how to conduct the 
search to locate the described object They only describe what the search is looking for. The 
interpreter includes a mechanism called the pauern tratcher which actually conducts the i^^arch. 

Essentially* the argument is that since search loops are not taugh^ explicitly in class, some 
general search mechanism must be in place before instruction begins. Therefore* only the 
descriptions that dri\e the search need be 3eamed in class and not the search loops themselves. 

Although patterns and pattern matching are the solution that the theory uses for the search 
problem, it is wonii mentioning a special search Construction that was once used by Sierra, K 
possesses many interesting qualities, but turned out rather pooriy compared to patterns and patVr 
matching. 
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12.3 I'ttnction nests :\s a^prcscntations of p;Jths 

Hic basK idea of Sierra's old rcprcscnratKjn w*is to describe a path between the current focus 
uf atlcntioii md the dobjred new focub of attention. "ITiis path was expressed as a nesi of functions. 
Kur cxtunplc. if the current fijcus is a column ttiat requires borrowing, then the Borrov^ from goal 
need^ tu be cilled on the top digit of the next column (o the left. To shift focus, (he following 
function nest was used: 

{TopDfgi t (Lef tAdjacen tCol umn Col )) 

lliis descnbes a path. It moves first to the column that is just left of the current focus of attention 
(reprebcnted by the variable Col). Then it focuses in on the top digit of that column. 

^Vhat makes this representation interesting is tliat it could acquire new descriptive functions in 
the bame ^dy that ne\t subproc^dures ^re acquired. The basic idea is to define functions using the 
same AND OK biruclure that procedures' control structure uses. The crux of the representation was a 
construction called an miersection Junciton. An intersection function is expressed as a functional 
AND goal Thus, LeftAdjacentColumn is expressed using the following intersection i tion: 

Goal : LeftAdjacentColumn (Col ) Type ; AND> Function?: true 

1. (Columns) 

2. (Left Col ) 

3. (Adjacent Col) 

This function intersects the sets returned by the three subfunciions> Columns, Left and 
Adjacent. Functions must be set'valued in order to make this work. Thus> a function such as 
(Left Col) returns all places that are to the left of the given column. The result of the 
intersection function aUjve would be the intersections of aU columns, all places to the left of the 
g.ven column, and all places adjacent to the given column. This mcaiiS that it returns a singleton 
Set consisting of the left>adjacent column. 

fnlerscction functions can have the same syntax as goals. Hence, learning new intersection 
functions would be similar if not identical to learning new subprocedur(?s. Moreover* rule deletion 
would be identical to "forgetting** part of a, term's definition. The whole concept seems quite 
tractable. In fact. Sierra used this representation for quite a long time, as representations go. A 
complete learner was built for it. That is where the fatal flaw was fpund. 

There are too many paths beiweer any im poinis 

By the time the later lessons of subtraction Sre encountered, a number of intersection 
functions such as Lef tAdj acentColumn have been learned The richness of this set allows long 
and silly nests of functions to be induced for descriptions. For example, the usual borrow-from 
ptace> 

(TopDigit (LeftAdjacentColumn Col)) 
could also be described by any of the following ncsts: 
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1. (TopOigIt 

(ColumnOfOigIt 

(Lef tAdJacentDlgit 

(BottomOigIt Col)))) 
Z . ( RightAd jacentOigi t 
(TopOlgIt 

(Lef tAdjacentColumn 

(Lef tAdjacentColumn Col)))) 
3. (AboveAdjacentOlg it 
(BottomOigit 

(LeftAdjacantColumn Col))) 



As thtsc nests illustrate, there are many paths to get from one place lo another. Path induction will 
find jll poSitbk paih^ from the current focus of aUenuon to the place where attention is next to be 
focused. When non*cycUc paths are removed, there are siill far too many paths. B\en when 
mmimal length paths are the on1> paths induced, there arc many path^. Moreover, all the paths are 
roughly equivalent in that procedures with different paths are obser\ itioyj illy indistinguishable. 

To avoid this redundancy, one really wants to merge those paths. One wants to describe the 
network instead of all the paths traversing it. "ITiat is exactly what patterns do. If a pattern consists 
of a set of relations among \ariab1es, the »eIations can be \iewed as labelled edges for a directed 
graph with the variables serving as the graph nodes. Thus, patterns express the whole network of 
relationships between current and successor foci, while a ftinction nest expresses only one path 
through the network. The reason that path induction generates so many silly, redundant expressions 
tt that It generates all possible paths between two nodes in the n A'ork. Clearly, learning is better 
represented as inducing the network itself Thus, it need not chc between altemative and neaily 
equivalent paths. 

Paths and relaxation 

The discussion above is aimed mostly at establishing a different perspective on patterns rather 
than criticizing the path framework. The damning problems with paths have to dc with the fact 
that they are hard to "relax." A pattern can be relaxed by deleting one or more of its relations, 
allowing it to match in more situations than it used to. However, ii joesn't work to delete one 
ftinction from a ftinction nesL Such deletions generally generate nonsensical paths. 

Relaxation is used in several ways. It is used by the Refocus repair in oidtr to firid a new 
argument for a stuck action that is "similar" to the argument value that causes the impasse. 
Relaxation is also u^ed in learning to generalize descriptions in certain situations. Chapter 18 
dixusscs these issues. SuRiee it to 3ay that paths are a poor representation for locative descriptions 
whenever those descriptions must be revised. As long as the descriptions never change> which is not 
the case here, then path representations work fine. 

12.4 Summary and fonJialization ^ 

It is often the case that procedures much search thv problem state to nnd information. If this 
search were represented explicitly in the procedure, then it vould occasionally Us^e die form of a 
search loop. Such loops would require at least one disjunction in order to tcrminatJ the loop when 
the desired infomnation is found. However, the one-disjunct- per- lesson hypothesis entails that each 
time such a loop is leanict* ^ would have to be learned in a short sequence of lessons. In many 
cases* such lessons are not found in today's curri-ula. Hence> lo maintain the truth of one-disjunct' 
perlesson learning, a facility for doing search Jiat is beneatll the grain-^ize boundary must be 
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added to the representation. A hypothesis to eapture this conclusions is: 
Pattern 

Procedures have patterns which are matehed against the current problem state* 

Iliere are man> issues introduced by the addition of pjtterns to the representation. Chapter 13 
diy-usbci> ^hat the set of relations should be for patterns and how the student*s understjnding of the 
problem suite, which is what patterns match against, should be reprL-sented. Clupter 14 discusses 
the expressive power that patterns should be given. They must have at least relations, such as 
(LeftOf X y) or {Column x) in order to describe the kinds of information that search seeks. 
However, it is so far an open question ithether the> ha\e logical eonsiructions such as quantifiers, 
disjunctions and so on. Chapter 17 shows that procedures need two kinds of patterns. Test patterns 
arc used for the applicability condiuons of rules. Fetch patterns are used for the focus shifting that 
occurs when a rule calls a subgoal. Chapters 18 and 19 discuss how patterns are acquired. 
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ITiis chapter discusses how to represent a studenrs 'mderstanding of the current problem 
suu. or rather, that portion of the current problem state that the student considers relevant to the 
problem soling tdsk. It is assumed that the student knows or believes that only certain kinds of 
objects and relations ai^ relevant to learning and problem solving in the given task. That is. it is 
assumed thji the student has a task-speafic ontology which sa>s what kinds of objects and relations 
e.^ist in a given problem state when that state is \iewed in a task'oriented way. This ontology is 
specific to the task the procedure solves becaus'; the kinds of objects and relations that are relevant 
vanc^ with the task. The relevant objects for subtraction are different than those for algebra, for tic 
uc u)C, or for drawing cartoons, en though all these tasks are carried out on p«per. Indeed, the 
task -specific ontolog> may even vary across subjects performing the same task. This variability is 
the essential problem. !t will be shown that the choice of objects and relations used in an inductive 
Icarniijg model ha^ a direct effect on the output of the learner. But the oniology is Ubk -specific, so 
the theorist must provide it, at least in part The theorist can control the predictions of the model 
by controHihg the objects and relations used to represent the student's task-specific ontology. If the 
theorist is allowed total freedom in choosing the model's objects and relations, then the empirical 
adequacy of the theory may depend more on the cleverness of the theorist than on the principles of 
the theory. The theory may have littie explanatory value. 

To summari/.e, there ar<^ two horns to tfie dilemma: Since the student*s task-specific ontology 
varies across tasks (and perhaps across students as well), the theory must leave its formal expression 
as an open parameter in the model. Some tailoring must be allowed. On the other horn, if the 
ontology parameter is too unconstrained, the theory may be vacuous. The problem is to provide 
some way to constrain the representation of task-specific ontologies. This chapter discusses three 
solutions to tha^ problem: 

1. Problem state spaces: The theory places no constraints ^>n the representation of task-specific 
ontologies. For each task (or possibly each student), the theorist provides a problem state 
dau structure, some ftjnctions and relations for accessii:g it, and some state change operators. 

- Aggregate object defmitions: Under this approach, the theory asserts that all students have tl v* 
same conception of two-dimensional space for all mathematical symbol manipulation tasks. 
That spatial conception is based on a few lijndamental concepts, including adjacency, 
sequence and the compass points: hori?.ontal, vertical and the two diagonals. The variation in 
ontologies across tasks and individuals is confined to aggregate objects. That is, different tasks 
will group the symbols differenOy. Subtraction cares about columns but algebra doesn't 
Different students might group symbols diffewnOy. Some algebra students group "24-3x'' 
such that 7 + 3" is an aggregate object. Although groupir^g strategies may vary, the set of 
basic spatial relations and the set of state change operators are both constant. A subtraction 
problem is a horizontal sequence of colufifins; an algebraic expression (e.g., "-24-3x^+y") is 
a horizontal sequence of signed terms. The model uses exactly the same spatial relations to 
describe both cases Essentially, the snatia) relations are a fixed, universal set. The objects 
vary across tasks and individuals. The set of state change operators is also a fixed, 'jniversal 
seU To ailor the ontology parameter of the model, the theorist provides only a set of 
aggregate object definitions. 
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3. Grammars, llijs approach takes the same stand on the universal character of spatial relations 
i\n<i state change operators as the object dcrinition approach. However, it expresses that 
h>poi)icsib diflcrcntly. Ilic cssentjal diflcrencc is that aggregate objects can be defined in 
tciTns of otiicr aggregate objects, hi particular an object is defined by a set of grammar niles 
th.it ma> mention other objects. The formalism for grammar rules embeds the ftjndamental 
spatial relations, adjacency, sequence and the compass points. 'ITie grammar also establishes a 
part-whole hicropchy of aggregate objects. This hierarchy is lacking from the object definition 
approach. !o tailor the ontology parameter of the model* the theorist provides a grammar. 

All three approachcb were implemented in Sierra at \arious times. They are only a small sample of 
the many ways that ontologies can be represented. More research is needed in thi: :rucial area* 
For the domain of mathematical symbol manipulation skills, it will be shown that the grammar 
approach yields the best theor>. However, it is not yet clear whether this general approach will 
work in other domains. 1 would hesitate to say that a grammar is the right way to represent a 
nuclear power plant operator's task bpecific ontology. Clearly the plant operator's ontology would 
ni>t be a simple two-dimvnsional grammar of the kind used in this theory. Perhaps it would be 
more like the device topologies used in de Kleer's work on causal models of physical devices 
(de Kleer, 1979, de Kleer & Brown, 1981), lliis will be a critical issue when the acquisition of 
procedural skills is studied in other domams than symbol manipulation. 

13.1 Problem state spaces ^ 

One approach to rcprcsentii^ the subject's task-specific ontology is to use a high-level 
structural description of the problem stale. For instance, the problem state might be formalized as 
operator-precedence trees for algebra or as matrices for arithmetic. One must also provide a set of 
detscriptive terms for the procedure for use in accessing parts of the problem state data structure or 
in testmg iti properties. Examples of such descriptive terms are a dmction that retrieves the left 
side of a given equation, or a predicate that is true of two columns when they are adjacent to each 
other. 

This approach is essentially a projection of Newell and Simon's problem space approach onto 
the problem state dimension (Newell & Simon, 1972), Problem spaces contain information that 
doesn't directly represent the current state of the visible problem. A problem space for chess 
contains information ab^ut the previous moves in the game, for instance. The problem state space 
approach is just a restnction of the problem space approach. It includes onl> information ^bout the 
problem state, lliat is, it represents the subject's internal representation of the current external state 
of the problem. 

A ftjndamental tenet of the problem spxe approach used by NeweP ard Simon is that 
problem spaces vary across individuals and tasks. The problem state space therefore is left as a 
model parameter that can be tailored hy the theorist to fit individual subjects. This is its Achilles 
heel Kor several years while repair theory was bein^ developed* Sierra's solver used problem state 
spaces, llie solver's performance was relatively insensitive to variations in the problem state space* 
Howcvert when the learner was developed, it showed enormous sensitivity to the problem state 
space. Man> important details of the Siudent procedures induced by the learner were controlled 
solel> by the problem state space. Since the learner was ba^-icaliy an inducer, slight shifts in how 
problems >vcre described were lifted by general! ratiou into the acquired procedure. Since the 
probiem state space has to be fitted by the theoust to the data, the theorist can tailor th? leamefs 
output to be just about anything. This gives the theory a great deal of tailorability. Indeed, it can 
be argued that it gives it too much* 
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The generaiion of Always- Borrow Left depends on the problem siaie space 

The pmblcm state space has enough tailorability that it even affects the generation of a 
premier bug of induction: M\^a>S'IJono\^'Lcft. lliis subtraction bug is a particularly clear example 
of how induction ean generate bugs as well as correct procedures. Its generation assumes that the 
studciit has learned on1> part of the lesson sequence for subtraction. In pdrljcalufi u assumes that 
the stu'ient has just been introduced to borrowing. In all tcxtbuuks that I know uf, the lesson that 
^nt^odt'C^YS borrowing uses only two column problems, such as a: 

5 5 2 

a, 6^6 b. 3 e'e 3 6^6 

" 1 9 -10 9 -10 9 

4 6 2 5 6 1 6 6 

Multicolumn problems, such as are noi used, ConSequentl>. the student has insufftcient 
mfurmation for unambiguously inducing where to place borruw's decrement. The correct placemem 
IS in the Icft-adjaccat column, as in b. However two column problems are also consistent with 
dccremeniing in the leftmost column, as in c. Given only two column examples, induction can't 
discrimmate between the two placement. The bug Mways-liorrow Lefi results from the learner 
takiog tliC leftmost generalization, rather than the left*adjaccnt generalization, which is the correct 
one, Always-Borrow-Left produces the kind of solutions shown in c. The bug occurred mx times in 
the Southbay sample of 375 students with bugs. Its existence is prime evidence that induction plays 
an imponaal role in procedure acquisition. 

The problem state space has total control over whether or not AIways-Borrow-Ij;ft js 
generated If the problem state space docs not include ''leftmost column** as one of the deicriplive 
terms, then Always- Borrow-Left is not induced* To cover the data, the problem state space has to 
have "leftmost column'* in it. Yet if the problem state 'Space has all plausible descriptive terms in it, 
then induction will generate star bug^. For instance, if "tens column*' is a descriptive term in the 
problem state ^pace. then two-column worked examples will generate a star bug that could be called 
•Al^^ays-Borrow-From-Tens-Column, Its work is shown in d and e: 

14 3 
6*5 6 6*5 4 1 

- 1 9 0 - 1 9 0 0 



5 5 6 5 6 4 1 

llie star bug decrements the top digit in the tens column even when that column has already been 
answered, as in e, or is ia the process of getting answered, as in d^ The model should not generate 
'Always-Borrow-From Tens-Column, Therefore, the problem state space ihuuld not include the 
dcscnptive term **tens column/' 

fhcse examples demo nsti ate that the empirical adequacy of the the theory is highly sensitive 
to the problem state space. If the theorist tailors the problem state space, then many bugs and star 
bugs are not explained, they are merely represented by the presence or absence of descriptive terms- 
Adjusting the problerr state space to include "leftmost column** and exclude "tens column** doesn't 
explain v*hy one and not the other is a salient descriptor to students- Tailorability reduces the 
explanatory value of the tlxeory, 
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Problem state spaces reduce the forecasting ability of the theory 

llicrc is d second reason that a tailorablc problem stale space decreases tlieorctical adequacy. 
Wiirri the thcor> is applied to a new task domain, it has tn be given a new problem state space. 
SupptJSe that data have nut been colloctod for thjs t<»bk domain. With nutliing tu tjiloi the problem 
state Space to. the tlieurist must rel> on intuition in formulating the prt.blem state space. Since 
tl^ere is no reason to belie\c in the theorist's guess for a problem state space, tlierc is no reason to 
believe the theor> s predictions, which depend direttl> on the problem state space. Ilic predictions 
merely reflect the theorist's intuitions. Hence, the theory is useless for applications that wish to use 
k instead of extensive data collection projects. As an example of such an application, suppose 
someone had just invented some papcr and pencil tools for calculation and a curriculum to teach 
people how to use them (ag,, a new way to solve fraction addition problems), This theory would 
be nearly useless for assessing the quaht) of that curriculum given that the problem state space had 
to be guessed. Tailorability reduces the forecasting ability of the theory. 



13*2 Aggregate object definitions: fixed spitial relations 

During ^he period that Sierra ^jsed problem stale spaces, the 3et of notational terms was 
adjusted in ord-^r to maximize the empirical adequacy of the learner's predictions. The resulting set 
of primitives had several regularities. For instance, there were several clusters of primitives thai 
expressed the idea of a scqi^ence of objects. A horizontal sequence of columns was represented by 
a cluster of primitives consisting of: 

leftmost column 
left adjacent column 
column A is left of column B 
rightmost column 
right adjacent column 

There were several of these sequence clusters. It seemed that a powerful underlying concept, 
sequence* was not being captured by the representation in its most general form* The next 
hypothesis, the ^regate object definition hypothesis* aims to rectify this* 

The basic idea behind the aggregate object definition approach is to split off general spatial 
notions from task-specific notions. The concepts that vary across tasks and subjects are mostly 
concerned with aggregation ^.*'s>mbols into groups* This part of the problem state space has to be 
left open for the the theorist to adjust. Everything else can be fixed. In particular, spatial relations 
are represented by the following set of predicates on objects: 

Topological 

{Adjacent? x y) x is adjacent to y 
(Inside? ,xy) x is inside y 

Sequence 

(Last? S x) X istiiv lastelementofsequence S 

(First? S x) X is the firstelement of sequence S 

(Middle? S x) x ib neither the fint nor last element of S 

(Ordered? S x y) x is before y in the sequence S 

Compass points 

(LeftOf? X y) x is to the left of y 

(Above? xy) x is above y 

(Superscript? X y) x is diagonally up' and right of y 

(Subscript? X y) x is diagonally down and right of y 
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Micrc arc cnn\cnlt(jns for ordering sequences. The first element of a hor]/x)ntal sequence is th, 
leftmost elen.ent, and the first element of a vertical sequence is the top element. 

ITle spfttial j^Ij^ons are a fixed set, provided by the model. To complete the representation 
of <i student b t^isk spccific ontolog>, the theorist provides a set of type and part relations for eaeh 
ciggfcgfttc obje^ ( and parts. Kor instance, tlie theorist could define an algebraic equation as an 
aggregate object by defining one type relation and three P^rt relations: 

(Equation? Q) true if Q is an equation 

(Lhs xQ) X is tlie expression on the left of the equation 0 

(Sign X 0) x is the sign (usually ^) between the equation's halves 

(Lhs xQ) X is the expression on the left of the equation Q 

As tirgucd \n chapter 12, patterns are used for all interface operations between the procedure and 
the problem state space. In particular, they are used as the applicability conditions for oilcs and for 
^hif^jng the focub of attention ^hcn a nile calls a goal. Patterns contain both spatial relations arid 
the reldtionb provided by the theorist to define aggregate objects. The following pattern might be 
used to test if an algebraic equation-Solving problem has been eompleted: 

(Equation? LQ) 
(Equation? Q) 
(Last? S LO) 
(Middle? S 0) 
(Above? Q LO) 
(Lhs X LO) 
(Variable? x) 
(Rhs y LO) 
(Expression? y) 

This pattern describes iv^o equations, LO and 0- They arc vertically aligned in some sequence S, 
and LO is last in the Sequence. The left side of LO is a variable* and the right side is an algebraic 
expression. This pattern would match b but not a: 

a. 2x+5 = 9 2x^5 = 9 

2x ' 9-5 2x - 9-5 

2x =^ 4 
X = 4/2 

Both probkm states have a vertical sequence of equations, but the pattern doesn't match problem 
state a since ^ertjcal sequence of equations does not end with an equation that has a single 
vanable as the left hand ?ide. 

The pomt of fixing the spatial relations is to remove a degree of freedom from the 
rcprcMintdtJon of t^isk spectfic ontologies. The only freedom left is the aggregate object definitions. 
In a bcnsc. the thcor> hafi been augmented with a micro-theOry of two-dimensional Space. This 
micro theorj increases the explanatoo uluc of the theory as a ^vhole. For instance, one of the 
pnncipleb of tlie mjcro theur> is that whenever there is a sequence, the first and last elements of the 
sequence are client to the student If the inducer sees a problem state where x is the first element 
of the sequence S, then (First? x S) will always be induced as a part of the description of x, fn 
particular, if x is the tef^jnost column in a sequence of columns, the induced description of x will 
include the constraint that it is the leftmost column. This explains why Mways-Borrow-Left is a 
bug. The column it borrows from is a leftmost column. Induction is forvcd to predict the bP** 
given a training ^t of t\^u-column borrow examples. The micro-theory explains the bug from the 
general principle that people al^^ays notice the boundary points of linear arrangements of objects. 
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13 J Gnimmars 

With the addition of a micro theory of space and d symbol Jevel grain si/e, the tailorability of 
tlie model is drastically reduced. The way that the model h adapted to represent varying 
Lonceptionb of noUiional syntax is hmited to definmg new kinds of aggregate objects and their 
db^ucuted part relation^. The set of primitive spatial relations and primitive wnt'ng operations 
remains the same. 

Ihib settjoii disctibseb several hypotheses about the meaning of pattern relations based on this 
representation of Uisk speciflc ontologies, in particular, what are the runtime implications of the 
bpatui relations and the relations that define aggregate objects? From the perspective of building a 
computer model, the issue is how to define the meaning of the relations that are used in patterns, 
If the pattern has (Column x) in it, how does the pattern matcher enforce this constraint on the 
bmdings of the pattern variable x? That is, wh.^t is the relationship between pattern relations and 
the current problem state? 

The myopia problem 

ITie most straightforward relationship between patterns and problem states is simply to 
provide each term with some geometric definition. For example, the spatial predicate, 
(Adjacent? x y) might be defined to mean that there is nothing but blank paper between x and 
y. By this definition, the 3 and the x are adjacent in both a and b\ 

3 X 

a. 3x=6 b. — = — 

4 8 

Not only are the 3 and the x separated only by blank space, but they are the same distance apart 
By an> local definition of Adjacent?, the 3 and the x are adjacent in both a and b. This strange 
interpretation of b is not one that subjects make, 

This myopic behavior was discovered in an early version of Sierra, At first I thought it was 
an instance of the old "how near is near'* problem that has plagued A! for at least a decade 
(Denofsky, 1976), How close do two symbols have to be to be adjacent? To my knowledge, no one 
has solved the "how near is near" problem, if indeed **solving" it makes any sense in the abstract* 
There are a collection of hacks for getting around it, mainly involving fudge factors and 
manipulations of the grain size of the coordinate system. However, as various increasingly desperate 
h.ickb were applied to fix Sierras myopia, it became clear that the approach of using local geometric 
dcfinitlc^s for terms was just too local to be workable. 

The robustness problem 

Another approach is to maintam locality of a sort by using definitions that search for maximal 
conditions wtthin a neighborhood* For instance, (Adjacent? x y) could be defined as "there is 
no object which is closer to x than y, and closer to y than x*" In other words, x snd y are the 
closest objects to each other ITiis would correctly rule out adjacency for 3x in 
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3 X 

4 8 

since the the closest object to the 3 is the equates sign* not the x, lliese sort of locally maximaf 
definitions are an improvement over absolute definitions, but even they have problems. 

For one thing, the> require special cases for s>mboIs that aren't roughly circular in shape, as 
tiie digits and letters arc. A fraction bar is one such symbol One wants 3 and x to be adjacent in 
the following fraction: 

3 X 
T' 

However, the bar is the closest symbol to the 3* not the x. Bars have to be made an exception to 
the nilc in order to get the 3 and the x to be adjaeent 

A second problem is a lack of robustness. A little sIoppineSA in the placement of symbols 
changes the truth values of predicates defined with IocalI> maximal definitions. For insidnce, in the 
first line of 

3 X = 6 
y = 2 

The 3 and the x a:e not adjacent because they on the second tine is too close. 

Adjacency is the foundation of symbol groups in mathematical notation, tt plays the role of 
string adjacency in text parsing or temporal adjacency in speech understanding* If adjacency can't 
be well defined, then the chances of an adequate definitions for other nolational terms is poor 
indeed* 

The local ambiguity problem 

There is a second kind of problem with local definition of relations. It involves local 
ambiguity. A local ambiguity occurs when there are several interpretations for a certain subset of 
the problem state* yet all but on<* interpretation fail to fit into an interpretation of the whole 
problem state. That is, there is ambiguity when only a part of the scene is considered, but the 
ambiguit) »*isappears when the whole scene is considered. Take, as an example* the string '7 + 3x/' 
One interpretation is that *7+3** is an expression* While locally correct, this interpretation cannot 
be extended to include the x (assuming a correct syntax for algebraic notions). To see why filtering 
local ambiguities is important, suppose the procedure wants to extend the expression by appending 
" + 5* to its right end. ft uses some pattern to fetch the current expression. If the page bears 
*7 + 3x" it IS possible that the pattern matcher will return "2 + 3" as the expression. Appending 
" + 5" to this will cause x to be overwritten. This is not a mistake that people make. Clearly, the 
definitions of pattern matching must be modified! so that such local ambiguities will not be 
retumeu* 

One solution to the local ambiguity problem is to have each term's definition check the 
context of its group of symbols as pan of determining whether it is true of them. For instance, an 
algebraic expression with two terms could be defined as: 
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(BinaryExpr x y) = 
(Term? x) A 
(Term? y) A 
(Adjacent? x y) A 
(LeftOf? X y) A 

[Vz (= z y) V --{Adjacent? z x) V --(LeftOf? i x)] A 
[Vz (= z X) V -(Adjacent? y 2) V --(LeftOf? y z)] 

[lie mtcnt IS for this rcfalion to match "2 + 3x" but not "2 + 3," Ilir^ definition uses universal 
quantifiers lu check that there ib nothing Just left or just right of the e^-^ression, flius, it rules out 
"2+3" because it sees the "x" just to the left of it, 

Actua11>, the definition is a little too strong. It must allow certain symbols ai its sides, such as 
equal signs, for instance. So each disjunction woul<l need to be extended *Aith literals such as 

,,,V (Equal (Read z) '=) V (Equal (Read z) 'D 

In fact, w,henever a new notational symbol is leamed all definitions, such as this one, for agg<'egate 
objects that can be located adjacent to the new symbol must have their definitions updated. For 
instance, when is learned, then the definition above would have to be extended* This would 
make Jt difficult, perhaps, to formulate a plausible theory for incremental learning of notation. 

Actually, using context-sensitive definitions ^uld probably not work in general. Consider the 
siring *7+315-8r', llie extended definition for Bina ryExpr given earlier allows on its side. 
So it correctly calls "5-8" an expression. However, it also calls^+3" an expression. In order to 
discriminate between the two, it needs to look ftjnher than one symbol away. * 

Whether or not this approach of using look-ahead in local definitions of terms will solve the 
local ambiguity problem in general amounts to asking whether every mathematical notation has an 
LR(k) grammar (or equivalently. whether it is a deterministic language)* It has been shown that all 
precedence iaflguagcs are delennmisiic languages (Floyd, 1963), Tlie class of precedence languages 
takes in the sort of linear mathematical expressions used in computer lan^^uages. However, it is 
uncertain s^hether tw* dimensional mathematical notation is a precedence language or even a 
delennmiblic language, fn short, there is reason 'o doubt whether the local definition approach will 
aUays be sufficiently powerftjl lo express all mathematical notation. Of course, haming definitions 
with many symbols of look'ahead may grossly complicate a theory of their acquisition. 

Using grammars to solve problems mth myopia and local ambiguity 

When relations are defined using only local geometric knowledge, relations such as **term" 
and 'expression'* ha no relationship to each other except perhaps for co occurrence in some 
patterns in the procc re. Consequently, there is no way to prune objects which satisfy their local 
definition but fail tc f^arttcipate in a global parse of the image. If notational objects are defined by 
a grammar, the definitions of objects refer to othtr objects by name. This provides inr:)rmation 
linking the objects together, \i can be used to solve problems of myopia and local ambiguity during 
pattern matching. 

The basic idea is that relations can have fairly sloppy, individual definitions if the definitions 
are used in concert A relation in a pattern matches a form in the problem state only if that form 
participates in a global parse of the problem state. The gran^nar as a whole acts as a filter on 
possible instantiations of the relations. Hence, the local parse of **2+ 3** as an algebraic expression 
in '7 +3x" is ruled out because it does not fit into a global parse of the whole problem slate, llius, 
the grammar is used to fitter out local ambiguities. Similarly, grammars solve the myopia problem 
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of pattern matching. In 

3 X = 6 
y = 2 

the 3 and the y are not adjacent because there is no global interpretation of the problem state that 
groups those ivto &>mbols together as a unit A local definition of Adjacent? might suggest that 
the> are adjacent, but the global interpretation \^uuld filter this suggestion out before the pattern 
matcher can mistakenly retrieve iL 

A technical detail: maximal p^ines 

To insist that the problem state have a complete parse is a little too strong, The current 
problem state may not have a complete parse. This is often the case in the midst of problem 
solving. En route fi'om "2 + 3x" to "2+3x+5*\ Lhc problem state is '*3 + 3x + '\ which is not a well- 
rormed algebraic expression. Since it hasn't a complete parse, pattern matching would not be 
permitted to access any part of it So the complete-parse restriction is a tad too strong. 

One stipulation that works is to specify that objects must participate in a maximal parse, A 
parse ts maximal if the group it ro\crs is not a proper subset of any other parse's cover. Since 
"2 + 3" is a proper subset of "2+3x'*. it is not a maximal parse of the problem stute "2 + 3x-i- 
Hence it is not accessible to matching. There are stipulations other than this one that work, but 
the> seem to produce exact]> the same filtering as the maximal parse stipulation. At an> rate, some 
kind of global coherenc> is necessary as a filter on matching, ^^Ithough currently the details of what 
that coherency is don*t seem to be too important 

13.4 Summary and formal hypotheses 

The arguments in this chapter concern how to restrict the ontology parameter of the model. 
The model uses a set of relations and state change operators to represent the way students structure 
their viev^s of the current problem state. In most cognitive modeling efforts (e.g*, Newell & Simon, 
1972X this parametu; is left tailorable. Dependii>g on the task and the subject's perception of it, the 
theorist constructs a different representation of the problem state, along \^ith the pattern relations 
that arc used to access iL This approach, dubbed the problem state space approach, gives a great 
deal of tailorability to the model. 

It v^as argued that the degree of tailorability was too high. The argument turned on the fact 
that the present thvor> is at^ inductive learning theory. (N.B., Newell and Simon's theory of human 
problem solving (1972) is not a learning u.. ur>, so its predictions may be less sensitive to the 
tdilonng of the problem space.) Subtle changes in the relations that described the problem state or 
the operators that changed it are lifted up by induction and placed into th^ procedure. Hence, the 
theorist md> contn)! the output of the learner by tailoring the problem ^taic space. ITiis reduces the 
theory's ability to explain why some procedures are acquired and others arc not 

The ontology parameter cannot be completely fixed sinoe subjects* perceptions of the task 
really are different across tasl^s and individuals. However, certain parts cf thetr understanding do 
not change much, given that the domain is limited to mathematical s>mbt>l manipulation tasks. 
These less variable kinds of knowledge can be fixed by the theory. The basic idea is that the 
subject's notions of two-dimensional space do not change much across tasks or fi'om one individual 
to another. However, the way that symbols arc grouped iniu aggregate objects docs vary, The 
theory can fix the spatial relations, but it needs to lca^e the specification of aggregate objects open 
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Eo tailoring. 

Two \*jvs Eo Lapturc this basic jdca were disjiisscd. One is Eo allow ihi: tlicori:>E lo define 
aiiiircgaic t^hjcct rclatJoti\ tilling Li^ip or ^onic other Lon\cnicnl langujgc. 1hC iclatioiis act locally^ 
rant:ni/ing jn^lani^e^ of thcmsehc:> ii\ the prublcm ^lUitc. l"or insuncci (Col umu x ) nught be true 
of \cnjuill> adjJccnt pairs iyf digits, ITiis wa> of defining relations turned out to have severe 
problcm^p Notions ^lUch a^ adjacency don't really depend much on the kK:al geometric relations 
between two :j>nwoh. but rather on the problem stale as a whole. ITiis ^esidlt aspect of spatial 
kno\fc ledge fur;.es ihc rcprescnuiiun in use the definitions of objects and spatial relations in concert 
to filler oul interpretations that people would not make of the problem state. 

lo make this ciH)perative filtering possible, objects are defined b> a grammar. "ITie grammar 
Uses the fundamental spatial relations as part of definitional formalism. ITiis wa> of defining 
aggregate objects is expressed by the following three hypotheses: 

Spalhl relations 

ITic following 5 relations are the Spatial relations: 

(First? S x) Object X is the first part of some sequential object S. 

(Last? S x) Object X is the last part of some sequential objects. 

(Ordered? S x ^) Object x comes before ^ in some sequential object S. 

(Adjacent? S x ^) Object x is adjacent toy in some sequential object S. 

(iPart X y) Object X is a part of object y. 

Grammars 

Aggr'^gationof s>mbots into groups is defined b> a spatial grammar based on the^oljons 
of sequence, part-whole and the compass points: honmntal, vertical and the two 
diagonals. For each aggregate object defined b> the grammar, a new categorical relation 
is defined. 

Rdailons 

ITie relations available to patterns are the spatial reknions, the categorical relations 
defined by the grammar, and the usual arithmetic ^-.cdicatcs. 

The relationship between the grammar and the pattern relations is implemented by categorical 
relations. When a ne^^ objea ts defined by the grammar, a new categoncal relation becomes 
available fo: used b> the patterns. For instance, the grammar might define a multidigit number 
with the following two rules: 

NUH DIGIT 

NUH - — > DIGIT (DIGIT)+ DIGIT ; HORIZONTAL 

(The formalism for grammar rules has not been motivated yet. ft is discussed in section 15.2. It is 
based on stmc ordinar> context f:ee grammar conventions, parentheses mean a categcr> Is optional 
and + means a category may be repeated arbiu^anly many times.) ITie first rule says that a number 
can be ju^t a single u.giL ITie second rule says that a number can be two or more digits in a 
honzontal sequence. Whenever a rule has more than one category on the right side, it must be 
annotated with one of the compass points: hon^ontat, vertical, superscript or SubscripL 

The grammar's definitions cause pattern relations to become defined. In tliis illusU^ation, the 
rules cause a categorical relation. NUH, to be defined for patterns, (NUH x) is true of the form 
*'23." Multi-category nilcs establish part-whole relations, When "23" is parsed by the second 
rule above, ( ! Part a ^) is true when x is the 2 and is the NUM object 23. Similarly, the other 
spitial relations depend on the grammar for their meaning. (Ordered? S x ^f) is tnie ^hen S is 
the number, x is the 2, and y is the 3. 
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Tlic categorical relation NUH is not true of the 2 alone in the form 23. One of the main 
purposes of using a grammar to represent aggregate objects is to filter out such local parses. Hiis 
was argued for in section 13,3, It is captured by the following hypothesis: 

Globat matching 

The set of objects in a problem state that patterns can jViaieh against is limited to those 
tha: participate in a maximal parse of the problem state as determined b> the grammar. 

rhjs hypothesis is responsible for keeping '*2 + 3*' from beinij treated as an expression when the 
whole problem s^te is '*2+3x/* It can be implemented many ways. Sierra iinplcments it by 
parsing the problem state bottom-up using the gramman ITiis yields a set of parse trees. Each 
parse tree covers some set of symbols in the problem state. The parse trees that do net cover af'^ 
maximal set of symbols are deleted. *[Tiose that remain contam. as parse nodes, all the possible 
objects thai patterns may mateh against. 

Representing state^changes 

The preceding discussion eentered on how students view a statie. unchanging scene: the 
current problem sute. It ^as assumed that students ha\e an ontology that says what kinds of 
aggregate objects and relations are relevant to the task at hand. The task-specific ontology 
structures how students \iew a single problem state. By symmetry, tliere ought to be a similar task- 
specific knowledge source that structures their \iew of changes in the problem state* 1 think that 
there is such knowledge* but I admit to being quite ^.^aftjsed on the subject At the crux of state 
changes lies the notorious frame problem of AL how does one handle the fact that only a httle bit 
of a problem state changes at time* so almost all references into it may remain unchanged* A 
central concern is whether foci of attention should refer extensionally or intensionally* Thesv 
difHcult issues are discussed in appendix 8* 
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Chapter 14 
Pattern Power 



Chapter 12 shtjwcd ih^u the iiilcrfaLC between procedurcb and problem bLitus should use 
patterns and pJllcrn matchHig. Huvtever* this lej\eb Tnan> issues unresolved. An important 
unrcsoKcd i:^uc concerns ht>\^ much dejiLnpttvc power palterns ma> ha\c. Hii^ issue b*isical1> asks 
what kinds of logical construcuons arc used jn patterns t>r equivalent!), what Uic representation 
language for patterns is. 

14.1 Vieviing patterns as logics 

A convenient and natural ^vay lo discuss itie ppwer of pauem languages is lo equate them 
with logics. A pattern corresponds lo a propositiun. What ihe pattern matches against corresponds 
to a model, in the logician*s sense of the word ''mi»dcK" Matching a pattern is equivalent lo 
satisfying ihc correspond! og piopobjljon m the model. Although this is a standard to look at 
palterns, an example might be helpfol lo bring it mlo sharper focus. A typical pattern fiOm a 
production system or Planner-like langu^^e is: 

{(?X ISA PLACE) (?X I« ICOL) (MOT (?X IS/BLAMK))} 

Pattern variables are indicated by a prefix; \anables that arc bound before the pattern is 
matched are indicated with a prefix. What tins pattem means is "gi\e me a place X thats inside 
the given COL and not blank* Hie equivalent in a firsforder logic would be 

3x (Place x) A (In x COL) A --(Blank x) 

iTie quanufier is existentiatly bound because the pattern should fail to match only if there are no 
blank places in COL, It should fail in a null (empty) model, for example. If x were universally 
bound, the proposition would be true in the null model 

The Order of the logic 

There are a number of constraints illustrated by the example. First, the log c must be at least 
first order. A propositional (variable-less) lugic habn't the expressive power necu^^i to mention 
several nolational objects at once. Both x and COL ha\c to be mentioned in the preceding eximple. 
The need to mention several objects at once ib entailed by die need lo shift focus in the procedure. 
Since focus was shown to be necessary for mathematiud procedures (section ILl), the pattern logic 
must be at least first order. 

There are many higher-order logics. Since they include first order iogic, parsimony counsels 
considering them only if first-order logics pro>e to have madequate expressive power. S? far> the 
expressive power of first-order logics has been sufficient to allow formulation of an empirically 
adequate theory. 
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Clausal form 

Ha\ing estdblishcd ihe order of iho pattern logic, one can ask whether all the descriptive 
pu\\or i>f first'urder logic ncccbs<jr>. A convenient wa> to do this is to examine, in turn, eaeh^ 
h^K,i\ cuniiccti\e, qtu^ntifier and s>titaciic tl\ \jto. Ihts *tppro^ch \% complicated b> the redundancy 
<jf firbt'order logic. Almost an> expressi- n uMiig a gi\en construction can be converted to a 
iugic*ill> cgin\uleiu expression that dues not use the construction. l"o convert an examination of 
connectives and other de\iccs into an exammatjon of descriptue power, we need to eliminate this 
redundancy. 

An easy \\ay to elimmate redundancy is to use a normal (or canonical) form. The normal 
form thdl makes thts dtscussjon clearest is <:Iausal fonn (sec an> textbook on mathematical logic, e.g., 
Vasuhara, 1971V An> proposition in a standard Tirsi order logic can be converted to clausal form in 
four steps: 

1. Remove implications; eg., 

^Vx3y (P X y) A (Vz (R x y 2) (Q Y 2)] 

becomes -Vx3y (P x y) A (Vz --(R x y z) V (Q y 2)] 

2. Push negations down to literals; eg., 

-^VxBy (P X y) A [Vz ^{R x y 2) V (Q y 2)] 

becomes 3xVy ^(? x y) V (32 (R x y 2) A --(Q y 2)] 

3. Skolemize. That is, concert existenually bound \anables into Skolem (anonymous) fijncdons. 
The arguments of the Skolem fijnction are any universally bound variables whose scope 
includes the existential quantifier. Nutlary Skolem funetions are expressed as Skolem 
(anonymous) constants; e.g., 

3xVy X y) V [32 (R x y 2) A ^(Q y 2)] 

becomes -(P a y) V ({R a y (f y)) A ^(Q y (f y))] 

where is a Skolem constant and / is a Skolem fijnction. 

4. Convert to product-of-sums form* that is> a eonjunction of literals or di^unctions; e.g,^ 
^(P a y) V [(R a y (f y)) A ^{Q y (f y))] becomes 

i:-(P a y) V (R a y (f y))] A [^{P a y) V -{Q y (f y))] 

While this version of clausal form is not quite a normal form (order within disjunctions and 
conjunctions has not been stipulated), it yields a short check list of eonstractions: 

1. predicates 

2. conjunctions 

3. Skolem constants (wide scope existential quantifiers) 

4. constants 

5. functions 

6. negation 

7. variables (univ<];rsal quantifiers) 

8. Skolem fijnctions (narrow scope existential quantifiers) 

9. disjunctions 

This list is the topic of the chapter. The discussion centers on which of these expressiv^^ facilities 
the pattern language should ha\e. It will be shown that tlie first three arc necessary* the next three 
arc optional and the last three should be prohibited. 
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U.Z PmWc'Ms. conjunctions ;tnd Skcilcm const^ints 

The first three conslriictb arc funJjincntal to patterns. All tlircc arc needed to express even 
tho Mmple pattern mentioned e^irlicr. ihc pattern jnd Jtb equi\alcnt cKujs^al form expression are; 

{(?X ISA PLACE) (?X IN !COL) (NOT {?X IS/BLAHK))} 

(Place a) A (In a COL) A (NonBlank h) 

■JTie predicates .md conjt^nctiims are ob\ious- The pattern \ariiib]e^ ?X. has been converted to a 
Skolcm Constant, ^j. It v^iW be assumed thjt predicates* ctjjxnificdons and Skolem constants are a 
piirt of the pattern rcprcseniation language. 

143 Constants, negiitionS und funelionS 
In the Plannci -style pattern, 

{(?X ISA PLACE) {?X IN j^COl.) (NOT (?X IS/BUAHK))} 
the variable !COL is bound outside the pattern. When the pattern is concerted to clausal form, 

(Place a) A (In a COL) A (NonBlank a) 

it is converted to a quasi-constant. COL* COL behaves like a constant willi respect to pattern 
matching in that the matcher doesn*t try to assign a binding to it* It behaves like a variable in that 
its value (i.e.* the binding assigned to it outside the pattern)* not its name, is what is used by the 
pattern matcher In order to interface patterns with the data flow machinv^ry that manipulates focus 
of attention* COL-Iike quasi-constants are needed ir pailems. 

For regular constants* such as numbers, it is a moot point whether they arc in the pattern 
language. The expressive power of numeric constants cjn be had by adding primitive arithmetic 
relations to the language. ThuSi to eliminate (Equal x '5), one employs { F i ve x)* So far, no 
empirical consequences have been discovered that could discriminate patterns with constats from 
patterns without them* As it turns out. Sierras implementation of the grammar automatically 
generates such '^coni^'^nt'* relations, naming them w^ith the symbols themselves: (5 x) means 
(five x) and (+ x) means that x is a plus sign* These are used instead of constants in patterns. 

Functions are similar to constants. By manipulation of the set of predicates* one can eliminate 
these devices. To eliminate functions, one uses relations; (P y (fx)) becomes (P y w) A (F w x). 
This variability can be used to express the show*work principle as a constraint on the Syntax of 
patterns. The issue is discussed in detail in section 15.1. However, some basic ideas vull be briefly 
presented here. The show-work principle says that if a subprocedure is to be learned, any 
intennediate results of its compuutions must be written down in on the page or chalkboard in the 
worked exercises that teach it- lliis has implications for the use of arithmetic functions because 
they produce ''invisible objects*" namel> numbers that are not usually present in the problem state. 
One use of patterns is in the applicability conditions of rules* Iliey are used to test whether a rule 
may be run by the interpreter in the current problem slate* l anthmejc functions arc used there, 
then the value that the arithmetic function produces is never written down. It i& calculated as part 
of matching the pattern, but it is ne\er passed further^ lliis use of functions can*t be learned when 
learning obeys the show work principle. Heiice, arithmetic functions can be omitted from patterns 
th^t serve as applicability conditions 
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Like consuinLs .md functions, ncgjiion cjn .ilso be clinjinjicd by nn)dif>ini; the bct of pattern 
rcljtions, in claiisji form* ncgadons onI> appear on literals* Conscqucntl), negation can be 
eliminated b> doubting the set uf pattern relations* ihc jieg*ition of cacli pattern r;:latiun ib added if 
ii doesnl already exist* lo eliminate (Mot (> T B)). one uses either («ot> T B) or (< T B). 
So the issue of ^^hether to permit neg^itions and fun^,lions in the pattern logic is moot with respect 
to expressiveness- llic convention that Sierra uses is to allovi negations on anthjneiic relations only. 
[Tie set of spatial and categorical relations (see section 13-4) is designed in such a via> that negation 
is not needed for them. 

14^ Disjunetionsare needed forvarinWeSandSkoleniftinctionS 

Disjunctions, \ariables and Skolem liinetions are elo&ely related. Clearly, if the logic forbids 
variables (univcrs^il quantifiers) then there viill bo no Skolom fimttjons (existcntjals inside the scope 
of universal quantifieis). Somewhat less ob\iOus is that forbidding disjunctions guts variables of 
their expressive power. 

In practice* most universally quantified expressions have the form Vx{P3R). ^Ilie P 
expression defines a domain of quantification to be iome subset of ;he objects of tlie problem state, 
llie R expression asserts something about the objecb in that subset* Consider for example, an 
expression that might be useftjl in algebra; 

3x3^ (LikeTermsP x y) A 

[Vz {Between? x z y) D -(LikeTermsP x z)] 

This asserts that x and y are like terms* and that ever>thing between x and y is not a like term to 
them. Iliis expression might be used to find the closest term y that c<in be combined with x . The 
universally quantified sub expression has the typical form Vx(PDR). It say* that none of the 
objects between x and y arc like t^rms to x. If x were the first term of 

2p^ + 3r + 5p^ + 6p2 

then y would be matched to 5p instead of 6p . When the expression is represented in clausal 
form, it becomes 

(LUeTermsP a b) A 

[-(Between? a z b) V -(LikeTermsP a z)] 

where a and b are Skolem constants* and z is a Skolem variable* The point is that the impiicaiion 
has become a disjunction. If the clausal form forbids disjunction, then the usual Vx(PDR) 
expressions cannot be used. The onl> ones thai can be used have a pure conjunction or a single 
hteral as the interior. Such expressions would assort something of every' object in the problem state, 
lliis is rather useless for pattern matching smce it docsn*t discriminate among \arious objects, ft 
^>s something about the problem state as a \vhole, but not about how to tell the desired objects 
from the others. It is difficult to believe that vanablcs (universal quantifiers) would ever be used in 
patterns if they could not employ disjunction. In short, if there is no disjunction, then there's no 
need for variables (ur.ivcivil quantifiers) and henco no need for Skolem ftjnctions (nanow scope 
exiilcntial quantifiers). 
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145 Disjunctions in patterns 

Oiaplcr 4 sho^^cd that inductive Ictrning of disjunciAms is tmpDssiblc unless some conbtraint 
is pLiced on the occurrence of disjunctrns in gCiieraii/jtK)ns, It was shuiMi that the observed 
prticOdurcs could be induced if induction v^<is c^Hislraincd cither io choubc the irunimal number of 
disjUKLlions or to learn at most one diyun^lujn per !cs>on. The Litter pobiiion was shij\^n to explain 
the dlmust uni\ersal use oflcssoiis ,is an instru^ttonal ,ud. As such, it v^as prefeired in the theory 
on grounds of explanatory adcgiiac>. Tins, am! other arguments, mutu.ited the acceptance of the 
une-disjuMLt-per-lesson hypothesis, 'flic h>pothcMb apphos to patterns as v^cll as control structure, 
of Course. Disjunctions in pjttems ,ire not introduced during ordinar> inductive learning of a 
subprocedure. Instead, each disjunction is the subject of a lesson itself. 

This assertion has an immediate ent.ijlmcnt, dtsjuncti\e not.itional concepts must be 
rer/rcsented in such a ^^ay that the> can be used m an> pattern, To sec this, suppose that "column" 
is a disjunctive notational concep^ is not so implausible smce there are tv^o kinds of 

subtraction columns: 

6 5 
Z 3 

'I'hc tens column has one digit* the units column has tv\o. Suppose ftjrthcr that "column'* is not 
represented in a ^^ay that allow^s it to be sh.ired among patterns, 1u describe 'leftmost column" 
requires the notion of "column/* \\hich is disjunctive, so "leftmost column" is disjunctive. Since 
learning a disjunctive pattern reqaires a lesson of it^ own. '^leftmost column" \\ou1d require a 
special lesson. To describe "left- adjacent column'* v^ould require anothei lesson. Every pattern 
employing the notion of "column" would have a disjunction, and one-disjunct- perl esson entails that 
each Such disjuncuon must have its owtn lesson, CIcarl>, this is not ho\^ notational knowledge is 
acquired. Instead, the concept "column" is taught once as a notational term. Afterwards, any 
pattern that employs tlie notion of columns just uses the temi*s name, eg,. 2S a predicate 
(COLUMN x). So. disjunctive notational concepts must be represented in a v^ay that a11o\vs them to 
be shared among patterns. 

To put it differently, learning nouuon involves learning the definitions of terms; once a term 
like "column*' js defined* a token standing for it ma> occur in an> pattern. Disjunctions therefore 
occur only in the definitions of notational terms, and not in pauerns. But what fcno^^ledgc base has 
these term defmilions? Oearly, this argument has provided independent motivation for the 
grammar: it is a repository for definitions of notational terms. Earlier, in section \IX we inferred 
Its existence as a solution to the myopia, robustness and local ambiguity problems of pattern 
matching. Here its existence has been supported as an entailment of one-disjunct- per- lesson. This 
convergence is a ^veak* but graufying argument in support of banning disjunctions from patterns. 
Although It is logically void (baause it is an abduction not a deduction), it seems to indicate that 
we are on the right track. ToVeiterate the basic idea, a pattern can*t say "its either two vertically 
aligned digits or a digit over a blank." It can only say "its a column*' and the grammar defines 
"column" with the disjunctive descnption "a column is tv^o vertically aligned digits or a digit over a 
blank/' 

This application of one-disjunct-per-iesson makes empirical predictions. As an example. 
Consider the predicate LikeTermsP. This predicate could be defined as 
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(LikeTermP x y) = 
(Term x) A 
(Term y) A 
[Vw (Factor w x) D 

[(Number w) V 3z (Factor z y) A (IsomorphicP z w)]] A 
[Vw (Factor w y) 13 

[(Number w) V 3z (Factor z x) A (IsomorphicP z w)]] 

This supuljlcs that x and y be terms that have identical factors, except for numerical factors. 
Hence. 3x^y is a like term to 2y3x^ but not to 3x^. LikeTermsP must be defined using 
unuersal quantifiers and disjunctiuns. ITierc is no way to express it without at least one 
dibjuikljon. Cimscqiientl), if a student docsnH know the definition of LikeTermsP before being 
bhu^n hu\^ to Lumbme terms, it is predicted that the student won*t induce the eorrcct patterns for 
th.it transformatiun. To do bu* the student would have tu induce disjunctions in paiterns> and that 
is ruled out by the one- disjunct- per- lesson learning principle* 

Remarkably, every algebra text that I have examined has a short lesson teaching 
LikeTermsP befure the fim lesson on combining like terms. This supports the prediction that 
non piimitive predicates with disjunctive definitions arc taught in their own lesson, 

Anoiher miy to leam aggregate objects 

Although man> notationaJ objects in algebra are introduced with explicit lessons* this is not 
gencrall> the case for arithmetic notaiional objects. In particulan the notat^onal term "column** is 
not introduced in its own lesson. Instead, it appears that "column * is taught by using a special 
device. In section 15.2* it is shown that lines* such as the bar used in subtraction problems* do not 
obe> the same gram.Tiatical conventions as other symbols. Instead, they are apparently used to 
mark the boundaries of forms. (This idea was suggested to me by Jim Creeno.) Thus, the bar of 
Subtraction marks the boundary between the answer row and the rest of the problem. Carrying this 
idea unc step further, lines miyit be used to teach new notational concepts. When subtraction 
problems arc first introduced, all the textbooks that I have seen use lines to mark columns. 
Examples arc 



tens 


units 


ICRS 


unils 


3 


7 


3 


7 


1 


5 




5 


2 


2 


5 1 


2 



The vertical and horizontal lines indicate how to parse the problem state, fn particular, they 
indicate that there are two columns. The columns are even named. Given enough drawings like 
this and the convention that lines mark boundaries, the learner can leam the aggregate object term 
"column" without an explicit lesson devoted to the subject Although not much is known yet about 
ho\^ ne\^ notational terms are acquired for the student's gramjnar> it seems that the basic position of 
applying onc-disjunct-pcr-lesson learning to grammar acquisition is quite plausible. 
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14.6 Summary :ind Torm^il hypotheses 

The issue discussed in diis chapter us ho^ much cx|)ressi\e puvtcr lo give to patterns, h is 
ihtmn that the oiiI> reall> eritical question whether diSjunviK^iis arc .illuvtcd in patterns. If thej/ 
are banned* then univervil qtiajitification and narrow M-opc existential guunlification are no longer 
usehiK so they can t>e dropped. 

Whether to ha\e disjunetions in patterns is a trick) i^mc. It clear that disjnnetion eannot 
be completely omflted from the interface because Snmc nvjtalK*nal coiKcpts must employ it 
However* mdueli\e learmng of disjunctive concepts is an impossible task uiileii* induciiun is strongly 
biased or constrained in some other wa>. The proposed vjUition t\ ofuld. notatiunal disjunctions 
are learned with special devices, such as an explicit lesson* tJiat telli> the inducti\e learner how to 
formulate the disjunction. This ib a simple application of the one dibjunci per-Iesson h>pothesis. 
Ilie saond half of the solution is to note that u on1> makc^. sense to .icquirc a disjunctive concept 
as part of the definition of a new notational lenn (aggregate object). Doing so makc^ the concept 
available for odier patterns, and not just the pattern at hand. 

Putting these two halves together implies that disjunctions* and bv imphcation* universal 
quantifications as well. t>ccur onI> in the defimoons of nutational terms. Notational terms are 
defined in the grammar* Although the descriptions in a grammar may be complex* patterns are 
Simple. In particular* since patterns lack disjunction and universal quantification* they are reduced 
to simple conjunctions of relations. By .suppressing the logical connective A, a pattern can be 
represented even more simply as a set of pattern relations. Mic argu'nents of the pattern relations 
are either pattern variables (i.e., Skolem constants) or goal arguments. A pattern relation can also 
be inclosed in a negation, lliai is all the logical machinery that is needed. Patterns can be just that 
simple. These considerations are captured in the following hypothesis: 



Conjunctive patterns 

Disjunctions, universal quantifiers and narrow scope existential quantifiers are banned 
from patterns. Semantically, a pattern is a conjunction of possibly negated predicates on 
existentially quantified variables* functions and constants. 

This hypothesis allows liinctions and constants in patterns since it ^as shown that their inclusion or 
exclusion is basically a syntactic natter. The issue will be dealt with (briefly) in the next chapter. 
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Chapter 15 
Syntax of the Representation Languages 

Almost all ihc major aspects of tlic kno^vlcdgc rcprcsciiuiticm language Imvc been discussed. 
Only one major feature remnins to be discussed. Cliapter 17 sliuw thai test pnttcrns 
(appticabiliiy conditions) should be distinct from fetch pauerns (focus bhifting patterns). Hus 
distinction will be assumed herein so that this chapter nia> complete the discussinu of the 
representation language by defining its syntax. ITiat is. the syntax of gu<ils* rules and grai^mar rules 
will be fixed. Even at this seemingly inconsequent level of detail, there arc a few problems 
whose solutions impact the theory's predictionii. Kur instance, il makeb a difTcrcnee whether 
grammars are represented as firsi-order logics or ct>iUcxt-frec graminan*. However, the impact of 
such competing alternatives is minor compared to the kindb of reprcseriation issues thnt have 
already been discussed. Most of this chapter will simply descrtbe the choices taken by the theory; 
there will be little discussion of the alternatives. Some readers may ^^ish to skip this chapter. 

15.1 Syntax of the procedure representation language 

Chapter 10 argued that goals have a binary type to distinguish AND goals from or goals. 
There are three tradilionul syntaxes for binary-typed goal structures: CTGs {context-free grammars), 
atns (Augmented Transition Nets)> and AOGs "(And-Or Graphs). The ibbue here is essentially a 
topological one. There is no dirferencc in the expressive power of the representations. Trivial 
algorithms exist to translate an expre^^^ion in one fi-e.> any aog. any crc, or any atn) into an 
equivalent expression in the other. Where the three representations differ is their effect on 
operations that manipulate them as sinicturcs. Their shape affects how elegantly and 
parsimoniously each operation can be formalized. Clearly, il doesn't effect whether or not the 
operation can be formalized, ff the operation can be defined for any of them, one can translate 
expressions in the others into the tractable representation, perform the operation^ then translate the 
result back into the original representation* So it is only tlie elegance of tlie theory that is at stake 
here. 

CFGj are not compatible with trmal ors 

In a CFG* OR goals are represented by non-terminals and AND goals are represented by the 
right sides of nilcs (see figure 15-la). In an atn, and goals are represented by levels, and OR goals 
are represented by states (see figure 15-lb). Both CFCs and atns naturally generate ''trivial** goals. 
A trivial goal has just one subgoaL CFGs generate tri\ial and goals corresponding to rules with just 
One category on the right side (e*g., A ^ B). atns have trivial OR goals whenever a state has just 
one arc leaving it (e.g., the first state of the ATN of figure l5-lb). Trivial OKs are a convenience in 
subproccdure acquisition. To acquire a new subproccdure in an atn. one vimply adds a new arc. 
For instance, figure 15-lc shovirs the ATN of 15-lb with a new subproccdure* B, added to it 
because of the implicit trivial or at the first state* the new subproccdure could be added without 
making any structural changes to the old ATN. When the representation uses trivial OR goals, the 
assimilation conjecture (section 10.1) can be interpreted in a quite literal fashion: acquiring a new 
subproccdure changes none of the old goal structure. 
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A ^ B (OR 

= (AND B) 

A C D (AND C D)) 



B ^1 — I C M — J (AND 

= (OR B) 



□ — - — o^-^^ — p 



(OR C D)) 



l_t ^ ^It - %RBE, 

(OR C D)) 



Q — C] — O 



(AND 
(OR F) 
(OR G)) 



Figure 15-1 

A CFG and several atns* with their logical equivalents 



Trivial Ors are worth having in order to allow subprocedure assimilation to be simple and 
elegant. To get them requires a little extra worX. Whenever a new subprocedure s AND has more 
l^lan one subgoal the new subgoals are placed inside trivial ORs. If the new subprocedure E of 
figure 15-lc had abgoals, as in figure iS-^Jd the> would be placed in trivial Ors. This means that 
all the digunctions that the learner could possibl> use are already in the goal structure; to add a 
subprocedure, the learner just adds a new disjunct to some existing disjunctions. Automatic 
addition of trivial Ors is natural in the atn syntax. It can be stipulated for the ago framework. 
For crGs. stipulation won't work. Adding the extra niles that are needed for the trivial Ors also 
introduces trivial ANDS. ITiese trivial ANDs clutter up the goal structure, making the Backup repair 
and other structure-sensitive operations go awr>. So the choice of three syntaxes is narrowed to 
two; ATNS and aOGs 



ATN5 >v/7/ not lei AND rules shift focus 

fn an aOG, both AMn goals and OR goals have arguments. In an atn, only the anh goals 
have arguments. They are called registers, Hach Air; level (- AND ZoM) its own locally bound 
registers. The applicative hypothesis entails that fucus shifting occur only in the actions of the OR 
mles (arcs) since these are what call ATN levels. In an aOO. both and rules and OR rules may shift 
focus. It will be shown that it is better to tise focus shifting on and rules only. Since this is just 
the opposite of the atn convention, it entails that aOGs are a better syntax than atn. 
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Syntax 



before 



after 



Ri 
0R13 
R2 



R1 



0R13 

R3/' \r2 



AND14 



OR 14 
R6 



,R5 
ORIS 
R7 



Figure 15-2 

An AOG fragment before and after a new subprocedure's acquisition. 

In ihc ATN syntaXi when a new subprocedure is acquired, a new arc and (in general) a new 
level are added to the ATN, The new level needs to be given registers. The adjoining arc needs to 
be given a focus shifting function. For instancei suppose the learner acquired the ATN of figures 
15-lc and 15-ld given the aTN of figure 15-lb. The learner would have to provide registers for 
level E and a focus shifting function for the arc of 15-lc that calls it. In addition, the learner has to 
induce the focus shifting functions for each of the actions of the new level, namely arcs F and G, 
The simplest way to provide a focus shifting funeiion for the adjoining arc (arc E) is to make it a 
"null focus shift,'' that is, simpl> pass the registers of the calling level (level A) down lo the called 
level (level E), This entails that all levels would have exactly the same register contents — focus 
would never be shifted, except just before a primitive action. This simple way of dxiding the 
adjuming arc's focus will not allow even a correct subprocedure to be acquired. lt*s unworkable. 

Another simple tactie is to assign to the adjoining arc the focus shifting function that would 
have been given to the first arc of the new level (arc F in figure IS-ld), That is, the focus that is 
appropriate for the first action is made the focus of the entire Ie\el. Although the details won*t be 
presented here, this uctic will not work either. For instance, it won't allow the main column loop 
of subtraction to be acquiredi 

Tito simple methods have failed to assign the adjoining arc a focus shifting function. Some 
complicated method appears necessary. However, a simpler path is to abandon aTNs and let only 
the AND rules bear the focus shifting function. OR rules will .iust pass the focus of the caller to the 
callee. Figure 15-2 illustrates the acquisition of a new subprocedure under these conventions. The 
adjoining nite, R3, receives no focus shifting function. It just pas^ ORlS's arguments to the new 
ANDi AN014. On the other hand, rules R4 and R5 are assigned focus shifting functions. Under the 
ATN syntax, R6. and R7 would receive the focus shifting functions that R4 and R5 receive in the 
AOG syntax, but the ATN syntax also requires that R3 have a focus shifting function. The 
problematic focus shift of R3 is avoided if the AGO syntax is used. 
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Syntax 



A'(7 QppticabiUiy conditions on AND rules 

U h<is been shown thdt subproccdurc tidjunclion is simplest if (1) llic procedure includes 
Xiwuxl oKs. and (2) only ani:> rules have fixju^ >hifting functiunx Tv llic^^ cnnveiifions. one can add 
the observation ihal anI5 rules have no use fur *ippliCtihiliT> conditions. All rules of an an!D goal 
will be executed m order If il is learned that ihc fii^>i subgoal of (AND A B) is optional the 
stnjcturat modification will be: 

(AND (OR A) (OR B)) (AND (OR A Noop) (OR B)) 

where Noop is an action that rnakcs no change in the problem state, ITie point is that trivial ORs 
mean that all knowledge about apphcabilit> i,an be Ctiptured on OK rules* ITicre is nu \jriability in 
the sequence or apphcabtlity of the and rules. Hence, applicabiht> condiiionb ^an be omitted on 
AND rules* 

Facts Junctions on OR rules only 

For some actions, facts functions are required* I or inst^ince. when simple borrowing is first 
acquired* the bonow^from action is to (I) fetch the next columns top digit, and (2) subtract one 
from it* The focus shift (1) must be on an and rule. Borrow s first rule in fact ITie issue is 
whether to put the facts function (2) in the same place* ff it is on the and rule, then both the 
focus and the decremented number will be passed to the trivial OR that is between Borrow *s first 
rule and the action that writes the new number down. When BFZ (i*e** borrow from zcio) is 
acquired, it will be adjoined beneath this trivial or* It will be passed the ORS arguments to use as 
its arguments* With respect to the focus poruon of the trivial ORS arguments, ifiis makes good 
sense. However, it makes little sense to pass the decremented value of the top digit* In fact, that 
value won't even be defined since a zcvo would have to be decremented to obtain iL Clearly the 
f^cts function Subl must be beneath the trivial OR if BFZ is to be acquired, 'ITiat is, Borrow and 
its trivial OR must have the following syntax: 

BORROW (COL) Type: AKD 
!• <a fetch pattern that binds T to the top^digit of the next column to the leftof COL> 
=> (0R13 T) 

2, 

0R13 (TD) Typer AKD 
1* true ^ (Overwrite TD (Subt (Read TD))) 

The fetch is on theAND*s rule 1, but the facts function is on the OR rule. 

This exa.Tiple prompts the general constraint that whenever a new subprocedure has an action 
invo]vir45 a facts function, the function nest is separated from the fetch pattern and placed on the 
rule of the trivial OR corresponding to the action. 

This convention makes it simple to state the show-work principle* all invisible object 
descriptions arc represented by functions, such ^s the facts functions, ITiese arc located on a special 
place on OR rules* namely the argument positions of the subgoals. With this convention, the show 
work principle amounts to (1) prohibiting functions in patterns and (2) limiting functions nests to 
containing at most one function that produces an invisible object (i,e*, (Subl, (Read T)) is okay 
but (Subl (Subl (Reao T))) is not)* 11ns means (liat pattern syntax is very simple: it is a 
coruunction of relations whose arguments are all \ariableb and goal arguments. Relations Such ^ 
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(Equal? X (Subl Y)) arc banned. syntaciii. i^Dnvcnliun cntdilb llial relations ihtit might be 
most rationally written with a Read function, e.g,. ( LtssThan? (Read T ) (Read B) ) , dre better 
written directly in terms of variables as is (LessThan? T B) . With this definition of the fjct^ 
predicates, the prohibition ngainst ftinetions in patten)s can be m<]de total. 

Summary 

ITie eonclusion is that the procedure representation language should be an aOo language 
subject to the following restrictions: 

TmialORs: Every and goal and every primitive goal isasubgoal of some 
OR. even If that OR has only one subgoal. 

OR rules don't fetch: OR rules do not have feteh patterns. Ilieir only use for 
patterns is as applieability conditions— determining whether or not to run. 

AND rulesdon't test: and rules do not have applieability eondilions, l^heir 
patterns are used for shifting the focus of attention. 

AND rules don't hme facts functions: Functions which ereate invisible 
objects are eontained :n the aetions of OR rules only. 

Patterns ha^^ no functions: Neither test patterns nor feteh patterns have 
iunetions. A pattern is represented as a set of relations on variables. 

These eonventions mean that AOGsean use a simple syntax for rules: a rule has 4 parts: 

1. The name of the goal it's under (the goal). 

X The name of the goal it ealls (the action's subgoal). 

3. A list of functions and/or variables that provide arguments for the subgoal (the 
aetion*s arguments). 

4. A pattern. This is interpreted as a fetch pattern for and rules and a test pattern 
(applieability condition) for OR rules. 

Patterns are simply sets of relations. However, for eon\enienec. the non-gram matieal facts 
predicates (e.$.. LessThan?) may be negated. Spatial and categoricd! relations are designed in 
stich a way that negation is not ;ieeded for them. 

15.2 Syntax of the grammar representation language 

The student's notational knowledge is assumed to be a eontext free grammar. Although it was 
implied in ehapter 13 that the grammar should have the descriptive power of first order logics, cfGs 
do not have quite that much power. In particular, CFGs have difficulty representing notational 
temris that would be easily represented if the repre!>cntational language had universal quantifiers. So 
far, I know of only one such term^ L ikeTerrtioP (see 14,4)* It cannot be represented in a CFG, or 
at least in the CFC language that Sierra uses. Nonetheless. crGs have many eomputational 
advanta^ges that outweigh this mincw lack of expressiveness. 
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t'xpresstng the universal spatial relations 

A m*iin purpose of ihc gramtrur rcpFCscnUtioii Kjngujgc is ty embed the spatial rcLttions that 
are held to be task- and subjecl-nidepcndcnt. I here jrc four spatial ideas; 

Fari-whole Aggregate objects have parts. 

Adjacency One object is next to another and no object is between them. 

Sequence Aggregate objects arc sutnetimcs sequences, in ^vhich c^ise the spatial relations 

First, Last. Iteforeand After are well defined among theeiemep.tSof the Sequence. 

Direction For mathematics, thecompass points — horizontal vertical, superscript and 
Subscript — are the important directions. 

In general, the Simplest CFG languages emplo> oni> part-whole and adjacency. ITie rule A B C 
me^ins that B and C are parts of A and that they are adjacent. Whenever such a rule occurs in the 
grammar, the categorical relation (A x) is defined. Moreover, when the rule is used to parse some 
objects, call them a b and c, then the rotlowing relations are automatically true: 

(IPart a b) Objectbisapartofa. 

(!Part a c) Objectcisapariofobjeaa. 

(Adjacent? a b c) Object b is adjacent to object c. 

(Adjacent? a c b) Object c is adjacent to object b. 

To add the idea of Sequence to the grammar language is easy. The standard artifice of a Kleene 
plus is used, but in a restricted -ntexu If an aggregate objects is a sequence, it is defined by a rule 
whose right side has three elements: 

EXPR =^ TERM (SIGNED-TERM)+ SIGNED-TERM 

This rule means that an algebra expression (EXPR) is a fist whose first element is a term* The last 
element is a signed term (i.e., a term with + or - ahead of it). It may have zero or more 
intermediate elements. The parentheses mean that the middle element is optional, the Kleene plus 
HfKanS that ic can be iterated, Ihe only place a Kleene plus ma> occur is on the middle element of 
a three-category right-hand side. This itnplcmenti> the notion of i^equcnce. It also implements the 
notion that the endpojnts of sequences are special. They can be a different category from the 
intenor elements of the sequence. As we see above, the lead element of an algebraic expression 
may or may not ha\e a sign as its first s>mboL but the remaining elements of the expression must 
have signs. 

Whenever some Symbols are parsed by a rule such as the one above, certain relations 
automatically are true of them. For instance, a is parsed as an EXPR with b and c as its parts, 
then the following reladons would be true: 

(First? a b) Object bis the first element of Sequence a. 

(Last? a c) Object c is the last element of sequence a. 

(Ordered? a b c) Object beomes before c in sequenee a. 

(!Part a b) Objectbisapartofa. 

(IPart a c) Objecicisapartofobjeaa. 

(Adjacent? a b c) Object bis adjacent to object c. 

(Adjacent? a c b) Object c is adjacent to object b* 
That IS, the normal part-whole and adjacency relations have been joined b) the spatial relations for 
sequence. 
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Category redundancy rules 

In order to get ihc parse tree to match the part-whole relations, rules that have just one 
category on the right. e.g.. A B, are treated specially. ITiese rules essentially express a 
categorical redundancy* A B says that B is an A, ITius, WUM8ER DIGIT means that a digit 
JS a kind of number. Such rules are called category redundancy rules. They are p*irsed differently. 
If "5*' »s parsed as a number then the digit 5 is not a part of the number 5, it is the number 5* 
This contention is needed in urder to allow the learner to use the part-whole relations to reduce the 
Lompkxji> of pattern matching and pattern induction* Without it^ m;itching and induction would 
be much slower, by several orders of magnitude (see section 18.1). 



Boxes and the compass points 

To represent the geometric infbrmauon in two-dimensional forms, some definitions are 
needed. 1 he rectangle that an object fits into is called a box* A box has four properties — left, 
right, top, and bottom — whose values are Cartesian coordmates in the plane of the image* XiLeft 
Will mean the location of the left edge of the box of X. Given this nomenclature* the way the 
grammar TL^prcscnis geometric relationships can be defined. 1Tie grammar rules have a modifier 
that specifies one of the compass points as the direction that their constituents run: 



HORIZ 



> Y Z ; VERT 



means 



means 



> Y Z ; SUPERSCRIPT means 



> Y Z ; SUBSCRIPT means 



Y:Right = Z:Leftand 
Y;Top = Z:Top»and 
Y:Dottom = Z:Bottom 

Y: Bottom " Z:Top, and 
Y:Left = Z:Leftand 
Y:Right = Z:Right 

Y:Right = Z:Ufl, and either 
l^(Y:Top + Y:Rottom) = Z:Bottoin or 
Y:Top = 'A(Z:Top + Z:Bottom) 

Y;Right - Z:Left and either 
l^(Y:Top + Y:Bottom) ~ Z:Top or 
Y:Bottom = "A(Z:Top + Z:Bottc»n) 



As the^e definitions indicate^ adjacency is defined as two constituents having boxes that share a 
boundary, they abut* Thus, Y and Z are horizontally adjacent if Y:Right = Z:Left. The relationship 
of a box to its constituents' boxes is one of containment In the above rules, the box assigned to X 
must property contain those assigned to \ and Z, and furthermore, no other boxes than those of Y 
and Z may overlap X's box* For example, the parse of a rational equation depends on a gross 
vertical expansion of the box assigned to the equality s^n; 



3 




X 


4 




8 



'ITie box for the equal sign contains only and blank space* 
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In the case of the rules for siipcrst-npLs Mid subscripts, j diyuncliDn is riccded t<j handle cases 
where either the base or the exponent is much larger than the other: 

r 1 l2 
I I 

L3 + X J iiasc;lt)p = ^^^d-Aptmcnt: lop +Kxponcnt;liouom) 

r 1 1 



2^^ + X J H(Uasc,Top + Basc:liutlom) =; Hxponenl :Boltom 

Special notational devices: bars, crossouts 

Besides the CDm^ iss points, there arc v^o Jthcr rule modifiers, i^hesc deal with lines of 
various kinds, rather than alphanumeric s>mDolb. Machcmatjcb seems tu use \crt^at and horizontal 
bars not as constitue.its of objects, bt^c as markers for boundaries, CiTiis idea was originally 
Suggested to me by Jim Greeno,) It soUcb man> problems. For mstance the bar of multicolumn 
arithmetic columns can't be considered a constituent of the problem because it would btuck one 
from using columns as constituents, Hiat is. in the problem 

6 8 7 
- Z 3 0 
2 5 7 

there arc three columns, but all of them intersect the same bar. If the bar is treated as a symbol in 
the samc^wa) that the digits are, th?n it would have to be shared in some wa> by all three columns. 
This is impossible in the context-free grammar formalism. The wd> the current grammar language 
handles this is to provide a rule modifier that specifics that the boundaries between the constituents 
paired by the rule be darkened, A rule for subtraction columns would be 

ACOL > COL (DIGIT) ; VERT BARRED 

COL DIGIT (DIGIT) ; VERT UNBARRED 

The first rule describes the column as a COL above an optional answer digu, separated by a bar. 
The second rule describes COL as a digit above an optional digit, and these must not have a bar 
between them, 

A second rule modifier is needed to handle the scratch marks that students use to cross out 
symbols. The slash or X put over a s>mbol is not the same sort of constituent as regular 
alphanumeric symbols. It overlaps other symbols. No other mathematical symbols overlap, ITie 
grammar language provides a special annoution tu indicate whether a constituent must be crossed 
out Or not crossed out. For instance, to accommodate the stack of crosscd out numbers that can 
occur when borrowing across zero, the grammar might use the rules: 

XNUH NUM (/NUM)+ /NUH : VERT UNBARRED 

COL CELL (%DIGIT) ; VERT UNBARRED 

CELL — DIGIT 
CELL ---> XNUH 

The^ first rule defines an XNUH as a number with some crossed-out numbers beneath. The 
means that the constituent must be rrossed out, /NUM is a number with a slash or an X through it 
A COL is defined as either a digit or an XNUM on top of an optional digit ITie indicates that 
the digit may not be crossed out 
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Chapter 16 
Sunnnnary: Representation ievei 

"Ilic representation level chapters 9 tlirough I5> clibei*>scs vvhjt Kinds uf cutislriimls should be 
put on the student kno\vledge is represented in the mudcL These b>puthcbeb define a furmal 
knowledge representation language. More important]). tlie> place eunstraints on the kinds of 
learning and problem soKing that the model can do. lliey affect the predictions made b> the 
model. Iliey are chosen to make the mode' s predictions fit the data. 1lie constraints on 
representation are empirical hypotheses. Unlike most Af rebCarUi on reprebcautjun languages, the 
aim is not to define a language that allov^s exprcb^iuu of subtle epi^tcniulugKul di^tm^tjons or a 
language that promotes mental h>giene among the knowledge engineers that use it. The aim is 
quite different. It is to define a language thai i^, true. ITie question ;s. what Jt laie of? Two 
ansvvers, the "mentalese** interpretation and the '*relc\ance'* interpretation, seem plausible to me. 

The menfalese interpretaUon 

One view is t^jat information in the mind has its ov^n structure, the mind's mctUalese (Fodor. 
1975). On this view, the reprcsenution language is true of the bubjecb' mentalebe in the same sense 
that a learning model is true of the subjects* learning. Learning ts an internal information process, 
mentalesc is an internal information structure. Neilhei learning nor mentalese can be dircctiy 
obsened. although their effects can be. The constructions cf the representat :/n language, e.g.. 
grammars, goal staeks and the like, are taken as describing menuUy held infonnation siructureSx 
Hypotheses about the representation language are true or fal^c in the same wa> that hypotheses In 
physics are true or false, lliis interpretation of the hypotheses js simple, traditional and elegant 
However. 1 find it a little hard to square with introspations on my own cognition. Ilie other 
interpretation, based on relevance, is ontologically verbose but more intuitively acceptable. 

The relevance interpretation 

A procedure is a way of describing a systematic sequence of actions that change the state of 
the problem. Defined this way, two issues immediately become apparent One issue concems the 
nature of the interface between the procedure and the environment: What is the vocabulary of 
manipulative actions thai the procedure can emplo> and v^hat is the vocabulary of descriptions of 
the en\ironment that the procedure uses in guiding its choices? lliis issue* the interface issue, asks 
about the range of primitive, individual input/output actions the procedure employs. Ilie second 
issue addresses the procedure's internal, runtime state. What kinds of actions can the procedure 
make to change the internal state? How does the procedure view or structure the internal state? 

Presumably, people have much more information than tliey actually use as they execute their 
procedures, This holds for both interface information and internal bate information. People can 
see much more on a page that bears an arithmetic problem than they deem '^elevant to its sojution. 
Similarly, they remember much more about what they have already done in solving it than they 
they deem relevant. A subject might remember that the last subtraction fact was very hard to 
remember or ihat the tens column's borrow was interrupted in order to watch an airplane fiy by. 
The real internal state of a human procedure is just as rich in irrelevant detail as the real written 
problem. 
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One iiucrprctariyn of ihc h>poihcscs governing the rcprcscnuioon langiMgc is ihat they 
dcbtnbc Kinds of informtttjun people Jccm rcit i^ifH lu Icartuiig ,ind prublem sohing* On ihis 
rclevjiivc based \\hon tlio ihcory usc> nun overLippin^ aggregiiicd objcctb* it is not cKiiming 
lhat people can t sec overlapping aggregateb. When Jt Lliinis that there is a simple go*i1 staek raiher 
ihtin the spaghetti stack that coroutines use, it b nut vljiiniiig tlut people cannot do coruutines* In 
both cases, the claim is only that vihen people 1 irn aritlimetie, they aet as if tliey believed tliat only 
non coroutine prtKedures are relevant and unl> lum oveilappmg aggregate objects are relevant* llie 
h>pothcscs are constraints on learning, jltlioiigh ihey are expressed as constraints on the kind of 
information stRicture that are learned* 

There is nothing inconsistent with holding both the mcntalcse and the relevance 
interpretations of the representation language* h could v^cVi be that the striictiirc of mentalesc 
causes onI> certain information to be relevant* it cuuld also be that rctevanc> causes procedural 
information in the mind to take on a certain stnicture* 'llie best way to find out what is really 
going on IS to push the empirical examttiation of the representation language as far as possible. The 
hope IS that when a great deal is known about the Kinds of knowledge structures that optimize the 
fit of various cognitive models it) empirical e\idence* ihcn the answer to such interpretations will be 
obvious* 

Preview 

This chapter summarises the representation level in two ways* First, it traverses the main 
pnnciples of the representation, bnefiy mentioning their suppipfting argumcnb* Second, it updates 
the formal model that was presented in chapter 7, the summar> to the architectural level* It will be 
shown that almost all of the model is entailed by the h>pothejes that define the representation* In 
particular, it is shown that only five issues remain to be discussed in the following leveL the bias 
level 

16*1 The interface issue 

It was just mentioned that the central issues of representation are the interface issue and the 
internal state issue. The interface issue will be summarized first 

The interface issue divides into sub issues. One concerns the descriptive vocabulary used by 
the procedure to format its access and manipulations of the problem state. Speaking 
meuphoricdlly, the issue is how does the procedure understand the problem space. In particular, 
vihat kmds of objects docs u think can exist? What is its private ontology? A common technique 
used by M learning models for specifying an ontology over problem states is to equip the 
procedure with an explicit set of pnmitive relations. This is not such a good practice since the set 
must be specified differently for different tasks* A subtraction procedure views its problem states 
differently than an algebra equation-solving procedure docs. Thus, the set must be tailored by the 
theorists for each task, and possibly for each subject* 

A less tailorable alternative is to fix the relations that are relevant in all tasks in the domain 
and vary only the task -dependent ones* Taking Lhis taek* chapter U found that the constant 
relations were spatial ones: vertical* horizontal, part-whole, adjacency, order, and boundary points. 
The task-dependent^relations all concerned aggregate objects — objects like columns o: equations 
thai are groups of other aggregate objects or individual characters* These considerations motivate 
the following hypotheses: 
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Spatial relations 

■[Tic following 5 relations arc thcspaiial relations: 

(First? S x) Object X is the first part of some sequential objects. 

fLast? S x) Object X is the test part of some sequential objects, 

(OrcSered? S x y) Object x comes before y insomesequential object S. 

(Adjacent? A x y) Object x is adjacent to y in some aggregate object A. 

(!Part X A) Objcctx isapartofaggregateobjcctA. 

Grammars 

Aggregation of symbols into groups is defined by a spatial grammar based on the notions 
of sequence* part-^/liolc and the compass points: horizontals vertical and the two 
diagonals. For each aggregate object defined by the grammar, a new categorical relation 
is defined. 

Relations 

'llie relations available to patterns are tJte spatial relations, the categorical relations 
defined by the gramman and the usual arithmetic predicates. 

The grammar expresses the student^s ontology* or ratlier, that part of the student^s ontology that the 
student considers relevant to the task. The procedure's patterns are couched in terms of the 
aggregate objects defined by the grammar (via the categorical relations), the spatial relations^ and a 
few arithmetic predicates, sueh as lessThan? and Equal?. 

The problem state is viewed as a z^stalt. A problem state usually has many locally w^H- 
defined aggregate objects that don*t fit into a globally coherent parse. When a procedire 
manipulates a problem state, it docsn^t use those. It uses only the aggregate objects that participate 
in global parse, A grammar is used in preference to a set of local definitions for aggregate objects 
because it allows the theory to capture this gestalt use of notational knowledge with a simple 
hypothesis: 

Global matchinz 

The set of objects in a problem state that patterns can match against is limited to those 
that participate in a maximal parse of the problem state as determined by the grammar. 

The grammar serves two purposes: ft defines the ontology of the problem space, and it filters out 
aggregate objects that are incoherent in a gestalt view of the problem. 

Given a problem state> the grammar determines the set of objects, aggregate objects and 
relationships that the procedure "considers*' potentially relevant However* the procedure needs 
some mechanism to access this field. In particular, it need to search this field in order to find 
appropriate objects, to shift its attention to. The search problem is another important interface issue* 
Chapter 12 argued that search is a skill that is not acquired tn the same way that the subtraction 
procedure is acquired. In particular* there are no lessons that teach search loops. The conclusion is 
that the search skill is in plac; before subtraction is taught. This means that procedures need only 
convey to this preexisting facility what it is that needs to be found. The search facility will find it if 
it is a part of the problem state. The descriptions are called patterns, and the search skill is called 
patf^m matching. The hypothesis that captures the theory^s chosen solution to the search problem 
is 

Pattern 

Procedures have patterns which are matched against the current problem state. 
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Having patterns creates ihe problem uf >peLLf>ing how muuh dcbcripti-^c power ihey ma^ have. 
"Iliis ts a ii A interface prublem — die pattern power problem, Ch*;ptcr 14 iiiveniorics ibe Slock of 
descriptive devices used in rirst order logiLs. Ii sho\^^ that the uuc disjunct pcr kwson h>polheSis 
entailb that patterns bhould noL hj\e disjunctions in them. Instcid. drsjuncti\e descriptions should 
be nained and inserted into the grammar. Siir«itar1>. imt\crsal tiiuiiiuficrs sliould be banished to the 
grammar along with existential quantifiers when thc> arc used inside the scope of universal 
quantifiers. Ilub solution to the pattern power prohlciii means the guimmar has a new function: it 
iS the repository of disjunctive and uni\en>.ill> quantified notauunal dcscnpuonb. 'Hie arguments 
motivated the following hypothesis: 

Conjunctive patterns 

Disjunctions, universal quantifiers and narrow scope existential quantifiers are banned 
from patterns. Semantjcally, a pattern is a conjunction of possibly negated predicates on 
existentially quantified variables, constants and functions. 

Arguments in section 15.1 showed that an elegant equivalent to the bhow-work principle is 
available: functions and constants were bantshcd from patterns and put instead in a Special place on 
rules. The rule actions may have functions and constants^ but patterns arc simply relations on 
pattern variables* 

16.2 The internal state issue 

The internal state issue concerns how the procedure keeps track of what it is doing, in 
particular, where it is currently working and what it ib intendmg to do. More accurately, one can 
divide the mte^'nal state into information that refers to regions in the problem state (focus of 
attention) and information that has no external referent (c\g,, goals). These kinds of information 
correspond roughly to data flow and control tlow, respecti\ely. Tliey can be considered to be two 
halves of the internal state question. 

Several arguments were presented in chapter 9 that show that control flow is best modelled as 
a goal stack, A stack-based. recursive control structure enables the learner to acquire center 
recursive subprocedures. Such as borrowing from icro, v^ithoot violatmg the one-disjunct^per lesson 
hypothesis. It also allows the Backup repair to be defined as popping the goal stack. Stack 
popping yields several observed bugs and a\oids some stc^r bug^ that are generated by other kinds 
of Backup (e,g., chronologieal Backup). The arguments motivate the following hypothesis; 

Recursive control structure 

Procedures have the power of push down automata in that the representation of 
procedures permits goals to call themselves recursively, and the interpreter employs a 
goal scxk. 

The second issue concem*^ how the procedure keeps track of its focus of \isual attention. Once 
again* the Backup repair is involved in a crucial argument. It is shown that Backup restores not 
only the control (goal) component of the execution state* but it restores the focus of attention, as 
well. This indicates that focus is locally bound. Goals are instantiated with the current focus of 
attention* When the stack pops to resume a goal (even if it is popped by a repair), the goafs 
original focus of attention becomes current once again. Moreover it is shown Ji^ once focus is 
instantiated for a goal, it is shifted only when the goal calls a subgoal. The goafs focus cannot be 
reset (i*e*> there is no SETQ for focus). These two aspects* together mean that the procedure is 
applicative: 
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Applicative data flow 

Daia flow Ls applicative. Ilicdata flov* (focus ofaacjilioii)ofapR)ccdurc changes (f and 
only if the control flow also changes. WhcncoiUrol icbumes d\\ m:>umtiJtJon of goal, the 
focus of attention that \vas current \\hcn the gful ^\.ts Lnbtanti<i;c(j becomes the current 
focus of attention. • 

In short> the procedure moves both kinds of internal sute tugcthcr Although one, focus of 
attention* refers to the external world and the otiier does iiot* b^ith arc sturcd together on a stack* 

Given that the procedure keeps goals on a stackT there needs to be some Lon^ention for when 
to pop tile stack* "fhat is* an exit convention is needed to iudic^iic \^hcn a goal is satjsficd and may 
be popped. ITie arguments concerning this issue are a little but there arc several and they all 
point to the same conclusion* An elegant and simple model results when goal^ are given a binary 
type. CITierc is preliminary evidence for a third goal type, a i-orcach loop* hvi it has not yet been 
incorporated into the model and tested*) and goals execute alJ their rjIcst OK goals execute just 
one: 

And-Or 

Goalsbcar a binary type* IfG is the current goal in nintjme state S> then 

(ExItGoal? S) isirue if 
L G is an and goal and all its rules have been executed* or 
Z G ^s an OR goal and at least one of its njieshave been executed* 

The central debate over exit conventions concerned the deletion operator* Ultimately, it was shown 
that the best formulation of the deletion operator was: 

Ai^'D rule deletion 

(Delete P) returns a set of procedures P'such that each P' isP with one or more and 
niles deleted from the most recently acquired subprocedure. 

Most recent rule deletion 

(Delete P) returns a set of procedures P' such that each P' is P with one or more niles 
deleted from the most recently acquired subprocedure* 

Since this formulation depends cnjcially on the and/OR type difTerence. it supports the And-Or 
hypothesis. 

When the syntax of the procedure is considered (in chapter 15), the And-Or types play a 
central role* The type of a goal determines not only when a goal is exited but also how its nile's 
patterns are interpreted, Ihe patterns of an AND goal's niles ^re interpreted as fetch patterns. They 
arc used to shift the focus of attention* The patterns of 0^ goals are used as applicAtlity 
conditions^ An applicabilky condition must be tnjc if the nile is eligible for exccutjun* In the next 
level> chapters 17 to 20* it is shown that these two kirids of patterns are subject to quite different 
learning biases. They are quite dirferent not only in function but in content and acquisition* This 
distinctiveness reflects on the original aND/OR distinction, adding a little more support to the 
principle* 

This completes the synopsis of the hypotheses introduced by the representational level. The 
remainder of this chapter spins out their implications for the formal m^viel First the learner is 
considered, then the interpreter and the local problem solver. 
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16.3 I'hc learner 

At the architectural lc\cl, the Icarnci: bpcciHcd m icnns of three uudcfiued rimcuons: 
Disjoin, Induce and Practice, These fLukUiJiib h*td tu he specified informal!) since the 
fonn^iljsm representations for prcKcdurcs had not vet been defined. Ihc rcprchcnuition level lus 
specified the needed representation Ktijgtuge. The thnc function:* can be defined. Ho^^eve^ it 
turnb out that die constraint:* imposed b> the rcprc^cnt+ition are nui quuc pu^verful enough for a 
complete definition, 1lie Icjrncr overgenerdie^^ prodacing millions of prcK:edures for cjth lesson. 
hortundtel>> it js not hard to see ^vherc tlie nii^sjiig conMramis go. Tins subjection explains the 
definition of the learning model sho\^mg \^here the mjbSing li>poJiescb go. ITic next level the bias 
level dibcus:>cs svhat tho^e hypotheses :*hoold be, i"orTnall>> three !ie\i undefined fuiKtioiis will be 
used to indicate ^vhere the massing h>puthebes go. The old undefined functions. Disjoin* Induce 
and Practice, ^^(ll be given definitions in tennb uf the ne\^ undefined functions, ITie mfonnal 
definitions of the old liinctions that ^^e^e guen in the architectural level are repeated below: 

represents disjunction free induction. Ihe first arguments P, is a procedure. 
The seeond, XS, is a set of worked example exercises. Induce returns a set 
of procedures. Bach procedure is a gcnerali/^,ation of P thai will solve all 
the exercises the same way that they are solved in the worked examples. 
Induce is not permitted to introduce disjuncts. If the procedure cannot be 
generalized to cover the examples, perhaps because a disjunction is needed, 
then Induce returns the null set. 

represents the introduction of a disjunct (e.g., conditional branch) into P, 
the procedure that is its first argument. ITie second argument, XS, is a set 
of worked example exercises. Disjoin returns a set of procedures> Bach 
procedure has had one disjunct introduced into it. Hie disjunct is chosen in 
such a way that Induce can generalise tlie procedure to cover all the 
examples in XS. 

represents another kind of disjunction -free generalization, one dnven b> 
solving a set of practice exercises^ XS. Practice returns a set of 
procedures. Each procedure is a generalization of its input procedure P. 

represents deletion. Parts of the input procedure P are deleted. Delete 
returns a set of procedure resulting from various deletions. 

Hie representation makes a distinction betNvecn control stniciure (goal hierarch>) and data flow 
structure (goal arguments, rule patterns and rule actiun:^). ITie simplest wa> to deal with these two 
sUTJcturall> dissimilar kinds of infonn'ation is *>o assign their acquisition to different function. 
Disjoin will be in charge of adding the new goal structure; Induce and Practice will add 
ever>thing else. That is, Disjoin grafts skeletal version of the new subproccdure onto the old 
procedure. The skeleton has goals and rules> but the goals lack arguments, the rules lack patterns, 
and the rule's actions lack arguments. Only the goal topology is fixed by Disjoin. Induce and 
Practice flosli out the sktletonb found by Disjoin. They do pattern induction and dinction 
induction in order to add patterns and action arguments to the new subprocedure. ITie reason for 
dividing the labor this way is that pattern and function induction are disjunction- free inductions. 
ITie only disjunct introduced by a lesson is in the goal structure. It is at tlie parent OR, the place 
v^here the new subprocedure adjoins the old procedure. Finding and adding that diyunct is 
Disjoint job. It requires a very different kind uf algorithm than diy unction-free induction. With 
these introductory comments Said, each of the previously undefined functions will be defined. 



(Induce P XS) 
(Disjoin P XS) 

(Delete P) 
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Disjoin 



In Sierra. Disjoin is implemented by a context-free parsing aJgoriihm. Given an example. 
Disjoin parses it using ihe old procedure* as if it were a context-free grammar, and using the 
ex^nnplo's problem state sequence as if it were a string of primitive actions. It can do this because 
tl) the procedure representation language is recursive, (2) goals have only two types, AND and OR, 
jnd (3) data rlow is applicative. These properties are, of course, main results of the representation 
level Because they are true of procedures, procedures can be used like context-free grammars to 
parse examples. When Disjoin parses an example, it will not be able to parse it completely using 
the olc? procedure. The example uses the new subprocedure, which the old procedure does not 
have However, by guessing all possible skeletal subpnocedures before it parses. Disjoin can 
figure out which of the possible skeletons will allow the example to be parsed. Let 
(Skeletons P X) be a function that returns all skeletons that allow P to parse the example X, 
"I^he next step is to apply Skeletons to all the examples and then take the intersection of the 
resulting set of skeletons. Let 



Skeleton Intersection returns every skeletal Gubprocedure such that adjoining the 
subprocedure to the old procedure would create a procedure that is consistent with all the examples 
in XS. Section 19.1 shows that Disjoin cannot be defined solely as Skeletonlntersect^on, 
11iis would cause it to output skeletons that lead to scar bugs. Apparently, students have some 
biases concerning the choice of control structures for their new subprocedures. Hypotheses are 
needed to capture these biases. Let InduceSkeleton be an undefrned function to capture the 
control structure biases of students* In effect* it returns some subset of the skeletons relumed by 
Skeletonlntersection. Given InduceSkeleton, the definition for Disjoin is 

(Disjoin P XS) = 

(Adjoin P (InduceSkeleton P XS)) 

Disjoin outputs a set of new procedures. Each output procedure \s a copy of P with a new 
skeletal subprocedure attached to it. Adjoin is a trivial ftjnction that attaches a set of skeletal 
subprocedures to P, producing a set of procedures Each one of these new procedures will be 
submitted individually to Induce. 



Figure 16-1 shows a typical subprocedure. Rules 1 through 7 are new* They were built by 
Disjoin. Rules 8 and 9 are part of the old procedure. The new AND goal and the three trivial 
OR goals are new. The parent OR is old, as are the three goals labelled Kidl, Kid2 and Kid3. 
Inducers job is to flesh out the new goals and the new nilcs by giving them arguments and 
pauerns. More specifically, it has four tasks, most of which are simple bookkeeping; 



(Skeletonlntersection P XS) = 1 1 (Skeletons P X) 

xexs 



Induce 
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Kidi 



K1d2 



Kid3 



A skeletal subprocedure. 



Figure 16*1 
Rule 1 is the adjoining atle. 



Goals beargenerie names. 



L Test patterns: New OR rules need to be given test patterns. For the trivial OR rules* such as 
rules 5, 6 and 7 in figure 16*1, the test pattern is the null pattern, {} (recall that {} always 
matches, therefore it is always true)* Inducing the test pattem for the adjoining rule (ag*, rule 
1 in figure 16*1) is a dirficult task. Let InduceTest be a new undefined (unction that 
calculates a test pattern for the adjoining rule. Its definition will be discussed in a moment* 

2. Fetch patterns: New AND rules.{e.g., rules 2, 3 and 4 in figure . 16*1) must have their fctch 
patterns induced. This is another difHcult induction problem* Let InduceFetch be a new 
undefined (unction that solves it* Its definition will be discussed in a moment 

3. Arguments: The new goals and the new rule's actions both need to be assigned arguments. 
Most of the time, this is easy. Since OR rules don't shift focus {see section ISA), the 
arguments of the parent OR can be copied and used both for the adjoining rule's action 
arguments and for the new and goal's ar uments. Similarly, the arguments of the Kids can 
be copied and used both as the arguments of the trivial OR goals and as the arguments of the 
tri\ial OR rule's actions. The only arguments left are the action arguments for the new AND 
rules (e*g*, rules 2, 3 and 4 in figure 16*1). InduceFetch determines these automatically: 
A fetch pattem represents focus shifting. I^he new shifted focus is bound tu certain of the 
patterns variables. Tliesc variables arc used as action arguments of and rules. So, finding 
arguments for each of the goals and actions is general!) just a matter of bookkeeping* There 
IS an extra twist, howe\er, under circumstances that are explained m the next paragraph* 

4. Functions: If any of the Kid goals is a writing action, as is often the case, then it will require 
the OR rule that calls it to pass it a number or other symbol to write. Unless this symbol is a 
direct copy of a visible one* it must be calculated by a nest of functions, 'l^his nest is placed 
in the OR rule's action argument. Inducing what this function nest could be is non-trivial, so 
it too witt be assigned an undefined place holding function, InduceFunction, 
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Out uf all these buukkecpjiig operations, three critical tasks emerge. (1) test pattern inductioni 
(2) fetch pattern jnductioni and (3) function induction. This makes some intuitive sense. When one 
teams a subpriKcdure, one fim needs to discover when it is applicable (lest pattern induction). For 
each action in the subpriKedure, one needs to discover where the action should be located (fetch 
pattern induction) and n'h<Jf new numbers it needs, if any (function induction). These three criuCm 
tasks have been formaIi/.ed as three undefined functions, InduceTest. InduceFetch, and 
InduceFunction. Defining these functions is the business of the next leveh the bias level. 



The function Practice is just like Induce except that it has fewer opportunities to do 
induction. It musi use whatever the prcxrcdurt; has for test pattemSi fetch paaerns and flmctions in 
order to ans^ei the practice problems. However, it may, in some cases, be able to narrow the space 
of patterns or function a little due to special cha'^^c tens tics of the practice problems. In general. 
Practice makes iittle difrerer.ce in the model's predictions, so no more will be said of it. 



llie Delete function is simple to define given the AOG representation. It inputs a procedure 
constructed ky Disjoin. Induce and Practice. It outputs a set of procedures. Each of the 
utuput procedures has had the rules of its new and replaced by a proper, non-empty subset of 
those rules. Given a new and with two rules. Delete outputs two procedures (deleting all the 
new AND's rules is pointless; it merely "takes back'' the lesson* yielding no new predictions). Given 
a new AND v^ith three rules. Delete outputs six procedures. 



Defining the representation language allowed subprocedure induction to be almost completely 
defined. The above discussion sketches a formal treaunent of subprocedure induction, ft has been 
informal in places because a good deal of the subprocedure induction is tedious bookkeeping. Four 
new undefined functions were introduced: 



As ihe names indicate, each function is an inducer. It outputs only generalizations that are 
consistent wiih the examples. However, it will soon be shown that pure induction is. in each case. 
ioo onconstrained. flie fuiiLtiors gcnenie £cncrahzations that people are never observed to 
acquire. To get the model's predictions to match human Icarnins bias predicates governing each 
function need to be defined. That is all that is left to do. ITie hypotheses defining the 
rcprescnt.ition are so pu\verful that the> almost completely define the learning algorithms that the 
learner must use, 

16*4 The solver 

The previous section partial]> defined the learner. This does ihe equivalent exercise for the 
Solver, llic architcctur.il l^^vcl defined the model's overall problem sohmg behavior in terms of 
three undefined ftmctjons, whose informal definitions, repeated from chapter 8, are; 



Practice 



De7ete 



Summary 



InduceSkeleton 
XnduceTest 
InduceFetch 
InduceFunctiOns 
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(Interpret P S) represents one cycle of ihe nonnal intcrpret^Uon (execution) of the 
procedure P. The second argument, S, is a runtime siafe: a composite of the 
internal (interpreter) state and the external (problem^ state. Interpret 
returns the next runtime state. Interpret is defined by tlie representation 
langu^e use^d for procedures. 

(Impasse S) is a predicale that is true when the runtime state S is an impasse. It is 

implemented by a set of impasse conditions. If any impasse condition is 
true, then Impassa is ^e. Impasse represents the problem detection 
component of local problem solving. 

(Repair S) represents the other half of local problem solving, problem rectification or 

repair, ft is implemented by a set of repairs, such as Noop and Backup. 
Repair returns a sei of states, Each state results from the action of one of 
the repairs on the input Sl^te S. 

The aim of this section is to define these ftjnctions. It turns out that their definition will be 
incomplete. Two undefined liinctions are needed. These in turn become the :arget of the next 
levers investigadon. 



Runtime state 

The functions use a runtime state, Now that the reprcsentadon language has been defined, 
the runtime state can be formalized. As stated in the architecture level, the runtime state is a pair; 
an internal (execution) stato and an external state. The external state is just a problem state. Tlie 
internal state has two components, as stack and a single bit of global state, called mtcrostatc. 
Microstate is used to remember whether the interpreter's last manipulation of the stack was a push 
or a pop. Microstate is needed for formalizing the Backup repair. Backup pops the stack, which 
automatically sets the microstate to Pop. But Backup needs to have the interpreter resume the goal 
that Backup left on the top of the stack rather than pop it. To cause this to happen, it resets 
microstate to Push. This fools the interpreter, causing it to interpret the top goal as if it had just 
been pushed onto the stack. If microstate were not available, Beckup would be less simple to 
formalize* 

The other component of the internal state is the goal stack. The stack needs to have more 
than just goals in it Each element of the stack (a stack frame) needs to be three components: 

Goal The name of the goal. 

8i ndi ng s Variable*- value pairs that represent the bindings of the goaVs arguments. 

ExecutddRulfiS The goal*s rules that have already been executed. 

The need for Bindings follows immediately from the principle that data flow is applicative. The 
Set of executed rules is needed so that AND goals may work pn.perly. if the interpreter doesn't 
know which AND rules have already been executed, it will just execute the first and rule over and 
over. For AND goals to have their intended meaning, ExecutedRules must be a part of the 
goal's state, fn production systems, the same affect is achieved by the refractoriness conflict 
resolution principles (McDermott & Forgy, 1978). 
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Interpret 

Tlic Interpret function executes the procedure on ihc runtime stale that it is given. It 
changes the stale in ways directed by the procedure, ll doesn'l do much, only one "cycle" of 
intciprcLitiun, SoKing a subtracuon problems requires hundreds of calls to Interpret. Given 
the h>putlicbcs on representalion, Interpret can be almost completely defined* It can't be totally 
specified baausc pattern matching hasn't been completely defined. Two undefined functions, Test 
and Fetch, will be used to represent how test patterns are matched and how fetch patterns arc 
matched: 

(Test S P) GivenaruntimestateSandapattemPtTest retumstrueorfalse* 

(Fetch S p) Given aruntimestate S and apattem P. Fetch retumsaset ofblndingsets for 
P*s variables. 

Edch assignment of \alues to vanables is a binding set. Fetch returns a set of binding sets because 
the pattern may match several ways. Or it may not match at all, in which case the set that Fetch 
returns would be empE>. The local pmblem solver, which runs between each cycle of Interpret, 
checks for these anomalous matches and repairs them. The bias level discusses exactly what kinds 
of m^itches the local problem solver treats as anomalous. The bias level also defines exactly what 
the matching ftjnctions Test and Fetch actually do. 

Given the two matching functions, there are many ways to define Interpret* One will be 
sketched in order to give a feel for some of the issues involved. This version of Interpret does 
either a Push or a Pop whenever it is called. That is. the cycle size is set at single pushes and pops. 
Finer cycle sizes arc possible. For instance, each binding of a goal ai^gumcnt could count as a cycle 
of the interpreter. Although Tve tried many cycle sizes for Interpret, none seem to have any 
advantages over the others- 

Figure 16-2 gives the code for this version of Interpret and the minor ftinctions that It 
employs. Interpret has two basic eases: (1) If mierostate is Push, then the current goal has just 
been started up. ff it is a pnmitive goal, then the interprecer just executes it; otherwise, a rule is 
choscii and executed. (2) If mierostate is Pop, then the current goal has just had one of its rules 
.executed. The choice is between resuming it, by choosing a rule and executing it, or exiting the 
goal by popping the stack. 

The exit conventions of the interpreter are implemented by ExitGoal?, If the top goal is an 
AND, It IS popped only when all its rules have been executed. If the lop goal is an OR, it*s done as 
soon as any of its rules are executed 

The conflict resolution strategies of the interpreter are implemented by PiekRule. It makes 
critical use c/ the order of a goaVs rules* and rules are in the order that the leamc saw them 
being executed t> the worked examples. The first rule on the gpals list is the first rule executed. 
Because the AOO language represents all control choice as OR goals, the patterns of AND rules are 
not used as applicability conditions. In panicular, PiekRule just takes the next unexecuted rule 
on the ANtfs list without doing any pattern matching. For OR rules, PickRule tests for 
applicability using the undefined matching function Tes t. If more than one unexecuted OR rule Is 
applicable, then PickRule returns the first one on the OR*s rule list The order of rules is used to 
represent the chronology of their acquisition. Tiie most recently acquired rule is first To 
summarise, the conflict resolution strategies are. (1) For AND rules, pick the first unexecuted rule. 
(2) For OR rules, pick the first (i.e.* most recently acquired) unexecuted rule that has a true 
applicability condition. 
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v;here 



If (Microstaie S) - Push then 

If (Goal'TOS S) is primitive then 

1, (EvalGoal (Goal'TOS S) B S) 

2, (PopS) 
else(ExecuteRule S (PickRuleS)) 

else 

If (ExitGoal? S) then (Pop S) else 
(ExecuteRnle S (PicKRuleSj), 



(ExHGoal?S) = 
Either 

(Goai'TOS S) IS an AND goal, and 
all of its rules are m (ExecutedRules-TOS S), 
Or (Goai-TOS S) is an OR goad aod 
(Executed Rules- TOS S) rs not empty. 

(PicKRuleS) = 

Return the first rule R of (Goal-TOS S) such that 
Risnotin(ExeculedRuies-TOS S)and 
either (Goai-TOS S) Is an AND goal or (Test S (Pattern R)). 

(ExeculeRufe S R) = 

1 Add R to (ExecUedRules-TOS S) 

2 lf(GoaMOS S) is an OR goal, 

then (InstantrateAction S (Actton R) (Bindings-TOS S) 

else (InstanllaleAclion S (Action R) (Car (Fetch S (Pattern R)))). 

(tnslan tie te Action S A B) = 

1 (Push S (ActJonGoal A) 0 (}) 

2 For each form f in (AclionArgs A) 

as each variable V in (GoalArgs (Goai-TOS S)) 

do Bind V to (EvalForm F B S) and add the binc^ing into (BirKlings-TOS S). 



B 

(Microstate S) 
{Goal TOS S) 
{BindingS'TOS S) 
(ExecutedRuies TOS S) 
(Pop S) 

(Pu3h S G RS B) 
(EvalGoal G 6 S) 
(EvatFormF 6 S) 



(Pattern R) 
(Action R) 
(Actio nGoal A) 
{Action Args A) 
(GoalArgs G) 
(CarX) 



Returns the current seiting of the mrcrostate bit in the runtimestate S. 
Returns the goal on Ihe top of the stack. 
Returns the biridings of the goal on the lop of the stack In S. 
Relurns the sel of executed rules on the top of the stack in S. 
Pops the stack of runlime state S. 

Pushes onto the stack of S a new stack Irame consisting of Q as the goal RS as tt>e set of 
executed rules, and B as the set of bindings. 

Dcecutes a primitive Qoal G in the runtime state S using bindirvgs 6. Primitive goals, e,g,, 
Write, change Ihe e?<ternai {problem) state but do not change the internal state. 

Executes a form (le iSvanableiaconstant or a function) in the current state wiih the 
bindings 6. and returns its value For variables, fi simply looks up the variable's binding in 6. 
Conslants are simply returnediwilh Iheir QUOTE stripped off. Functions, nuchas the 
arithmelic facts functions, make no changes to S of any kind. 

Returns the pattern of the rule R. 

Returns the action of Ihe rule R. 

Returns the goal called by the action A. 

Returns the hst of forms that are ihe arguments of the action A. 

Returns the list of variables I hat are the arguments ol the goalG. 

Returrj^ the fifsi element of a list X. 



Figure 16-2 

(A) Main code for Interpret, (B) Primitive and utility functions. 
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Focus shifting is accomplished b> matching fetch patterns. Since AND rule's patterns are used 
as fetch patterns, ttie ftjnction Exei^uteRule calls Fetch for AND rules but not for OR rules. 
Calling Fetch ^lugmenu the cunent bindings sflrith the bindings of the pattern \anables. (N.B., The 
interpreter assumes the local problem soher will calth any mismatches, so it just uses the first 
element of the set of binding sets returned by Fetch.) When the action's goal is instantiated by 
InstantiateAction, ihe goal ai^umcnts are bound to the values of certain fetch pattern 
variables. This accomplishes focus shifting. 

The point is only that the interpreter can be almvoC completely defined, excepting only the 
iijnctions Test and Fetch, and that its definition is rather simple. The aOG language is not very 
complex, and neither Is its interpreter. 

Local Froblem Solving 

The local problem solver is formalized by a predicate. Impasse, and a liincuon, Repair. 
Both are driven by sets. Impasse ubcs a set of impasse conditions. Ftepair uses a set of repairs. 
These two sets are constant parameters of the model. Although the exact membership of both sets 
is still open for investigation, their vahie, whate\er it is, may not be varied across tasks or subjects- 
To do so would give so much taiiorability to the model tliat the theory would be difHcult to reliJte. 
flolding the sets constant represents the assertion that local problem solving is a widely known, 
task-independent *H1. It concerns procedures per se, and not just piocedures for solving particular 
kinds of tasks. Task-independence entails that impasse conditions and repairs mention only aspects 
of the execution state. For instance, the Noop repair simply pops the stack, it executes (Pop S). 
The Backup repair also uses only the execution state. Its implementation is: 

1. (PopS) 

X If (Goal-TOS S) is an and, then go to 1. 
3. Set (Micrustate S) to Puah, 

This pops the stack to the fir OR goal, then resets the execution state so that it will be entered. 

The impasse conditions also mention only the execution state. For instance, the following 
impasse condition is true if PickRule would fail: 

'if 

1. (ExitGoal? S) = false, and 

2. (Goal-TOS S) is an OR goal, and 

3. There is no rule R in the rules of (Goal-TOS S)such that 
R€(ExecutedRules-TOS S) and (Test S (Pattern R)^= true, 

then impasse. 

This condition checks for halt impasses — times when the interpreter would have to halt because no 
rule applies. Another impasse condition checks for mismatching fetch patterns: 

if 

1. (ExitGoal? S) = felse,and 

2. (Goal-TOS S) is a non-primitive ANDgoal, and 

J. R is the first of its rules that is not in (ExecutedRules-TOS S), and 

4. (Fetch S (Pattern R)) is not a singleton set* 
then impasse. 

This impasse condition checks whether the next call to Interpret will call Fetch, then it checks 
whether the pattern will mismatch (i.e., whether it will fail to match at all, or more commonly, 
whether it matches ambiguously). If so, an ambiguity imp,tsse is signalled. The ambiguity impasse 
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wili play an important role in the Bias IcvcL 

At present, the model uses only five impasse conditions. Two have just been discussed. 
Another checks for infinite loops. A fourth causes an tmpasse when the current problem state is 
not syntactically well formed. The fifth impasse condition checks for precondition violjtions. in a 
sense, the precondition impasse eondition is special since it must uonsutt non -execution st*ite 
information. It must look up the primitives* preconditions. Technically, this violates the principle 
that impasse conditions refer only to the execution state. However^ preconditions are inevitable in 
any model that has primitives (and all compucTional models do). Just as the actions of a primitive 
uperator are beneath the griitn size of the model the impasses of the primitive are also beneath the 
grain .Uze. Preconditions represent internal impasses that have been lifted up to the grain size 
boiinda.7. For instance, the facts function Sub is a primitive with a praondition. Suppose it were 
represented as a nun-primitive procedure that, say, uses finger counting to calculate differenecs. ft 
might reach an impasse ealculating 5-7 when it tries to tick off a finger and finds there are no 
more fingers to tick off. The precondition at the Sub-sized grain is a lifting of this impasse to a 
higher level. In short, preconditions are as inevitable as primitives. Hence^ an impasse condition 
that refers to them is also inevitable. 

The basic point is that defining the execution state allows defining the repairs and the impasse 
conditions. These in turn define the local problem solver. 

165 Preview Of the bias level 

The hypotheses on representation have taken us a long way- They not only defined the 
reprcsenution language, they defined almost all of the learnen the interpreter, and the local 
problem solver. The only issues left to discuss concern the six undefined functions mentioned 
above; 

InduceSkeleton 

InduceTest 

InduceFetch 

InduceFunctlons 

Test 

Fetch 

The first four express the biases of the learner. The representational hypotheses defined all possible 
patterns and skeletons consistent with a lesson's examples, the bias functions filter out the choices 
thai human learners are never observed to choose. As will be seen in the next section, these biases 
are relative rather than absolute. They compare iwo choices and say which is better- The 
reprcsenlational hypotheses are absolute, fn a sense, they say of a smgle choice whether or not it is 
good. Because the biases are relative, they cannot be built into the representation language. 
Representation languages can express only absolute constraints. 

The other two undefined functions. Test and Fetch, concern pattern matching. There are 
many ways that patterns can je matched against the problem state- For instance, they can be 
matched to maximize the nu-nber of matched relations, or they can be matched to maximize the 
numher of bound pnttem variables. The m^itching issues are intimately related to the learners bias 
for pattern inductions. The bias principles express how the learner views the worked problem, and 
ine matchmg pnnciples express how this view^^aint is applied to exercise problems. They are duals. 
They form two ends of an "informational conduit" between examples and exercises. The next level 
discusses both issues together, despite the fact that its name, the bias leveL refers to just one. 
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Chapter 17 
Two Patterns or One 



At ihi*; point in the development of the representation, it has been shown that patterns 
represent the interface between the procedure and the current problem state. There arc two jobs 
that patterns are used for: (1) shifting the fbcus of attention, and (2) testing applicability conditions 
in order to decide which rule to take. It would be parsimonious if the same patterns could be used 
both for testing and fetching. This is what produ tion systems do- The pattern on the condition 
side of a production rule tests whether the rule is applicable, and it binds variables for use in the 
actions of the other side of the rule. However, it turns out that the data force the theory away from 
the parsimony of single-pattern niles. Two kinds are needed: fetch patterns and test patterns. This 
chapter discusses wh> both Kinds are needed The distinction between fetch and test patterns was 
as<;umed in the chapter on representational syntax; now it is time to fulfill the promise and show 
that the distinction is well motivated, 

lliis chapter also introduces some important bug data that set the empirical stage for the 
arguments of the following two chapters. In particular the bugs indicate some general trends for 
se\eral ke> issues; First, the evidence indicati^s that induction should be biased so that fetch 
patterns are fairly specific. Second, it indicates that impasses occur when matching fails due to 
o\erspccifjc f^tch patterns. Third, it indicates that induction should be biased so that test patterns 
arc fairly genera!, l^he best way to undcrsund these general trends is to examine the evidence 
itself It concerns a group of bugs that will be called for handy reference, the fetch bugs. 

17.1 ITie fetch bug? 

The fetch bugs that will be discussed here are clearly a product of incomplete learning, In 
particular, it seems that students were tested just after they wore introduced to borrowing. 
Introductory borrowing is always exemplified using two-column problems, such as a: 

4 4 4 

a. 6^3 6^3 7 c, 7 6^3 

-19 -19 2 -219 

36 365 536 

There is no logical reason against using multicolumn problems, such as b and c but in the 
textbooks that fve seen, they are never used in the initial borrowing lessons* 

I^he general story for the fetch bugs goes like this; Suppose students abstract a highly specific 
fetch pattern to describe where borrow*s decrement goes. When they are given multicolumn 
problems, such as b or c their overly specific fcich patterns may not match. Phis triggers an 
impasx. leading via various repairs to ea;:h of the fetch bugs. In order to verify this story, each of 
the fetch bugs will be discussed in detail. The telltale Cartesian product pattern of impasses and 
repairs will be uncovered. However, in order to make the exposition easier to follow, it will be 
couched in terms of fetch patterns, impasses and repairs, just as if tlie point under discussion had 
already been decided. After the Evidence is exposed, opposng hypotlieses will be evaluated, 
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4 

!k)rrow-No'l)ccrcmcat' a. 7 6^3 b, 6*3 7 

l-Accpt'Usi: - 2 1 9 "19 2 

5 4 4 X 3 4 5 V 

When iJiis bug's fetch piUlcrn is induccti from two-column borrow orobfcms> the tdirner abstracts 
Uic r^tct that the column that is borr<»v^eO fnim is adjacent to the column that is borro^^ed into. 'Itie 
learner also abstracb that the borrov^ from column is the leftmost column. Iliis dual description is 
o^enipccific. Jt is tnie of borrowfrom columns only on two-column problems, 'Itiat is what 
ultimately leads to the bug. To make the discussion concn;te, suppose that the fetch pattern 
contains the fragment 

(Adjacent? G BFC BIC) 
(First? G BFC) 

\^hcre G js the \ariab1e for the problem grid, BFC is the column to borrow from, and BIC is the 
column to borrow into. The first relation means that the borrow-from column is adjacent to the 
borruwinto column. The second n;lation means that the borrnw from column is the leftmost 
Lolumn ni the problem. Adjacent? and First? are always true when borrow problems are two* 
column problems. That is \^h> the> are pn;sent in the fetch pattern. The learner apparently chose 
a highly specific generalization of the two-column training examples. 

On a three-column problem, such a$ a, the pattern fails to match. There is no column which 
is both adjacent and leftmost. This failure causes an impasse. The above bug» Borrow-No* 
Dccrcment-Except l^ast, is generated b> repairing the impasse wKh Noop. Hence, the bug just skips 
the decrement if the pattern docsn^t match. On a three-column problem with the borrow 
originating in the tens column> as in b, the pattern matches just fine. The hundreds column is both 
leftmost and adjacent to the borrow-into column. The match is exact, so no itnpaific occurs. The 
decrement happens as it should. So far, the ffetch bug story is bom out by the bug evidenee, 

6 4 
Always-Borrow-Uft: a* 7 6^3 b. 6^3 7 

- 2 1 9 -19 2 

4 4 4 X 3 4 5 V 

This bug is ^jnved the same way as the one just discussed, except that the impasse is repaired 
difTerendy, Instead of a Noop repair, the local problem solver uses another repair, called the Force 
repair (baauscs it forces the interpreter to choose when there is ambiguity). The Force repair finds 
the closest match for the fetch pattern. When there are several closest matches, then the repair 
chooses one of them* In the case of problem a, there are two closest matches for the ffetch pattern* 
One match binds the hundreds column to BFC, the column borrowed from. This binding satisfies 
First? but leaves Adjacent? false. The other closest match binds the tens column to BFC, 
lliis makes Adjacent? true and leaves First? false. When Force takes the first match, then the 
bug Atways-Borrow-Left is generated ff it takc^ the second match, then the correct borrowfrom 
placement is generated. 

Two points are crucial for this bug and Borrow- No- Decrement- Fjtcept-Last; (1) The fetch 
p<ittern is too specific; it has both Adjacent? and First?. (2) Impasses sometimes occur 
whenever fetch patterns fail to match exacdy. 

Two other fetch bugs differ ftom the two just described only in the kind of fetch patterns 
they have. The new bugs' fctcn patterns tebi the numencal relationships of the digits in the borrow- 
from column. In two-column borrowing problems, the tens column has the property that T>B 
(where T and B stand for the top and bottom digits of the cj^lumn, as always). The following 
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pattern fragment is always true for the tens column uf two^olumn borrowing problems; 

(. . . (Adjac&nt? G BFC SIC) 
( IPart 8FC T) 
{'Part 8FC 8) 
(First? 8FC T) 
(Last? 8FC 8) 
(LessThan? 8 T) 
(Hot (LessThan? T 8) ) 
(Not (Equal? T 8) ) . 

The last three relations are the ones that matter. They specify 11<T, IKT, and B;*T. This pattern 
will fail to match exactly on problems that require two adjacent borrows, Such as 

a. 3 6*7 b. 3 6*7 

- 1 9 S -16 8 

15 9 19 9 

l>,e tens column Falsifies one or more of the last three relations of the fragment ,tbove. Hence, the 
Fetch pattern will fail to match for the borrowFrom for the first, units-column borrow. This failure 
causes an impasse. "ITie impasse leads ultimately to the following bug: 



4^4 8 

Borrow- Don i-Docremeni- a, 6 6*7 b. 6 6*7 c, 6 9*7 

Unless- Bottom -Smallen * 1 9 8 -16 9 -16 8 

369X 409X 439V 



ITiis bug results from taking the Noop repair to the impasse. It skips the decrement unless the 
column is T>B, That is, it impasses when the borrow-from column is not exactly like the tens 
column of two column borrowing problems, with respect to the relative size of the column s digits, 



3 

4 4 Z 

Borrow-Across* a. 6*6^7 b. 5 6^7 c, 6 9*7 

Unless-Boilom-Smallcr: - 1 9 8 - 1 6 9 - 1 6 8 

269X 309X 439V 



A second bug is generated by taking the Force repair to this same impasse. The repair finds the 
closest matches to the fetch pattern. In problem b, there are just two closest matches. In problem 
a, there are three* 

In problem a, the pattern matches the hundreds column if the adjacency relation is relaxed. 
This is the match whicfl generates the bug shown above. The pattern matches the tens column if 
the two LessThan? relationships are relaxed. This match generates the correct borrow-fVom 
placement, ITie pattern will also match the tens column if T and 8 are bound, respectively, to the 
bottom and top digits of the tens column. That is, the match preserves the LessThan? 
relationships bj "inverting'* the column. This means that it will try to decrement the bottom digit, 
generating a very rare bug. Borrow -From- Larger (see appendix 1). 

In problem k where T = B in the tens, the first two matches are still available, but inverting 
the column is no longer a closest match. Turning the column upside down doesn't make eitlier 
LessThan? relationship true. So, only two matches arc good in problem b. Taking the match that 
relaxes adjxency gives the solution shown in problem k Taking the other match generates the 
correct borrow-from plxemenL So, if the Force repair always chooses the match that relaxes 
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adjacency, the bug shown above is generated. 
Summary 

The fm»r fetch bugs described above fit into the familiiir Cartesian pRxliiet pattern that 
indieates impasses being repaired. It is summarised by the following table: 



The rows indicate the two impasses. Row one inJicaics tr>ing to borrow fiom a coUimn that is not 
the leftmost eolumn. and row two indicates tr>mg to borrow from a eolumn where T<B. The two 
eolumns mdteate the two repairs. Noop and Forec. The four cells of the table are abbreviations for 
the four bugs. This Cartesian product pattern is dear and compelling cudence for the existenee of 
the impasses due to overspecifie feteh patterns. 

17.2 Test patterns ^ feteh patterns 

So far, no evidenee has been presented that fetch patterns and test patterns are distinet. 
Although II was assumed that the> were different so that the s>ntax of the representation could be 
defined in an earlier chapter, that assumption needs to be backed up. The hypotheses of the 
preceding chapters allow test and fetch patterns to" be the same. It could be that a single pattern is 
used for two conceptually distinct purposes, but that the distinction is not reflected in the actual 
procedural representation, lliis single-pattern approach is the one most often used in production 
systems. The condition side of a production rule has patterns that serve both ftjnetions 
simultaneously. If the pattern matches, the rule may be run. As a by-product of the match, 
variables are bound for use in the action Mde of the rule. So production rules (and many other 
patteni-invocation fonnalisms, e.g.. Mieroplanner. Sussman et. al, 1971) use one pattern for both 
testing and fetching. However, there is fairly clear evidence that two distinct patterns are needed lo 
cover the bug data^ 

Single pauem and excel matching 

The fetch bugs have overspecifie patterns for fetching the column to borrow from. If only 
one kind of pattern existed, then those same overspecifie patterns would be used in testing whether 
or not to borrow. Because the patterns are overspecifie* they don't match exactly. When patterns 
arc used for tests tn production systems, truth is equated with exact matching- That is, a pattern is 
considered to be true only if all its relations match. Under the exact match convention, the fetch 
bugs* patterns are false. Since the pattems that test whether or not to borrow arc fatse due to their 
overspecifie it>, the eonesponding Borrow subprocedure won't be executed. In particular, the 
patterns will be false in exaedy the cases where the fetch patterns were found to be causing 
impasses and repairs. Tliere is a contradiction here. Because the fetch pattems are being executed, 
the procedure has chosen to take the Borrow subprocedure. Yet by hypothesis, the pattems 
governing its applicability were false. To generate the feteh bugs, either (1) the patterns are 
matched closely instead of the usual exact matching, or (2) test pattems are distinct from fetch 
pattems and furthermore. the> are more general* so that they will be true O^e.* match exactly) under 
conditions that would cause the fetch patterns to be overspecifie. llie first case will be shown to be 
unv^orkable, leaving the second case as the conclusion* 
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Single pattern and closest match 

ITicrc is a problem \\hcn cU>scsi matching v> ui>cd for testing patterns. Since closest matching 
rarety fails, it is infensible to use failure to represent the false \^1uc of a pattern's test "Die only 
optinn is [o compare tlie dosenesss of matches. "ITie njle \\huse pattern matches the problem state 
most eloscly is the rule to execute. 

\iui this won't let the theory generate some of tlie fetch bug^. In particular, it won't generate 
tile fcich bugs that come from the second overspccific pattern mentioned above, the one that has 
r>B. T>fi and IVB in it. Under the single-pattern hypothesis, this pattern is used both to fetch 
trertain locations and to test ^^hether or not to execute the Borrow rule. Suppose a problem that is 
appropriate for generating the fetch bugs is presented. The pattern is matched. Because it is being 
used to generate the bugs, this match will not be exact. In particular, the relations T>Ii, T>B and 
1VB will fail to match, 'lliis does not make the Bormw rule inapplicable. Rather, the other rules 
for processing a column must have their patterns matched in order to fmd out which rule's pattern 
has the closest match. One such rule will be a rule that does ordinary, non^borrow columns, lis 
pattern will not match exactly. The borrow column has T<B but the pattern has T>B. So the 
p:ittem relation ■|'>B will be false. ITie job of the interpreter is to decide which pattern matches 
more closely: the borrow pattern, which has three unmatched relations, or the non- borrow pattern, 
which hai one \inmaEched relation. Note that the unmatched relation from the non*borrow pattern, 
1'>B. is also one of the unmatched relations of the borrow pattern. Hence, any way of measuring 
closeness of match that is monotonic (i.e., QCP implies tQ|<|PI) will perfer the non^borrow pattern 
over the borrow pattern. Because monotonicity is widely held to be an axiom of human measures 
of similarity (Tversky, 1977), we can assume that the interpreter will judge tliat the non-borrow 
pattern matches more closely than the borrow pattern. Hence, the Borrow subprocedure will nv. be 
called a exactly the times when the fetch bugs show that it is being called. The theory can't 
generate the fetch bugs if closest matching is used to test rule applicability. The singlcpattern 
hypothesis cannot be sahaged by postulating that closest matching be used to test for apphcability. 
The only way to get the theory to generate the fetch bugs is to assume that Sorrow's test pattern is 
a different pattern from its fetch pattern. 



17J Formal hypotheses 

The conclusions of this chapter are summarized in two hypotheses: 
Two patterns 

The representation uses difFerent patterns for testing rule applicability and 
for focus shifting; Test patterns are used in choosing which OR rule to 
execute. Fetch patterns are used to shift the focus of attention (data flow). 

Test pattern match 

A test pattern is considered to be true if and only if it matches exactly (i.e., all 
its relationsare true in the current problem state). 

In addition to these hypotheses, the chapter yielded several general observations that will be honed 
in the following chapters: (1) fn some cases, such as the fetch bups* test patterns are more general 
than fetch patterns. (2) Fetch patterns tend to be highly specific. This can be assumed to be the 
result of a bias in their induction. (3) Impasses sometimes occur when fetch patterns do not match 
exactly due to overspccificlty. 
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Chapter 1 8 
Fetch Patterns 

This chapter discusses fctch paucrns. Tlicrc arc two key issues to rcsoKe. Ilie first concerns 
buising induction; How specific are fetch patterns? M Uic end of pattern induction, the set of 
patterns that are consistent with the examples is usually very large (about 2™)- ITie patterns range 
fro .1 large patterns (100 relations) that are hjghl> specific, to bmall patterns (one or two relations) 
that arc highly general- The first issue of this chapter concerns which of these patterns arc 
preferred by learners for fetch patterns. 'Ilie second issue concerns a set of bugs, called the fetch 
bugs, which were introduced in chapter 17, "lliey seem to be caused by fetch patterns that are 
overly specific. To account for the fetch bugs, the ihcor> has to describe how their matching 
triggers impasses. 

18.1 Version spaces 

This chapter contrasts two solutions to these issues. However, before they can be stated, a 
new formalism must be introduced. It is a simple induction technique, dubbed version spaces by 
(Mitchell, 1982). ITie goal of pattern induction is to find all conjunctive patterns that are consistent 
with the given instances, where instances in this case are problem states. To make the discussion 
easier to follow, it will be assumed that the pattern being induced will be used as the test pattern 
for some rule. For test patterns, there are two kinds of instances, positive and negative. Positive 
instances are problem states that the target pattern should be taie of (match exactly in). Negative 
instances are problem states that the target pattern should false of (should not match exactly in). 
To make versjon spaces clear, i' helps to restate the pattern induction problem in logical notation. 

The target pattern, call it T, should match exactly in all positive instances. Let positive 
i:istanee / be represented by P^, the set of all literals that are true of II (A literal is a relation or a 
negated relation.) Then Pj^T, where means logical implication. That is. T is true in problem 
state I whenever Pj implies T. Similarly, let negative instance / be represented by the set of all 
literals that are true of it. Then N^^^-T T is false in problem state / whenever Nj implies not T. 
So the induction problem is to find all T siich that: 

T is in the pattern language, and 
P^^T for all positive instances i, and 
Nj^-'T for all negative instances / 

Bit this is logically equivalent to finding all Tsuch that: 

Tis in the pattern language and P^^T^-Nj for all ij. 

Implication is of course a partial order on proportions. Lets say that X->Y means that Y is above 
X in the partial order. Then P is above the and below the -^N^. One may be able use an 
enicient representation by saving the least upper bound (Itn) of the Pj instead of the Pj> and the 
greatest lower bound (GLB) of the '*Nj instead of the ^-N^. To make this representation equivalent 
to saving the states, the lub and GLU will have to be filtered to remove elements that do not 
conform to the transitive implication relation just stated above. Mitchell tails this representation a 
version space (Mitchell, 1982). The filtered LUn is designated S. ITie filtered GLB is designated G. 
To put it difierently: 
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G is the set of maximnlly g^/i^r^j/ generalisations. 
S is the SCI of maximatly s/^^cz/ic generalizations. 

Unle^ the kinds of implications allowed by the pattern Kuiguage arc extremely limited, S and G 
will be enormous, making the version space representation \^or^e than merely keeping the Sj and 
the -Sj. However* in conjuncdvc pattern languages there iS onh one analytic miphcadon: 
(XAY)-^^X. "I^hus. the I.LIJ can be computed b> deleting conjuncts from the P,*s descriptions 
(assuming that the problem state s is described b> the set of all relations that are tnie of the 
problem stale). For instance, if P^ is AABAC* then the only generalij^lions of ii thai are in the 
pattern language are; 

BAC, AAC, AAB, A. C, true 

Because the pattern language is so strongly constrained, the logic^il implications are simple. Hence, 
the compulation of the lub is quite simple as well. The computation of G is only little more 
complex. One has lo **filter'* the glb against S before actually computing it, so to speak. l*he 
atgorithms are documented in Mitchell (1982) and Cohen and Feigcnbaum (1983). 

There are still the non-analytic implications to take into account, such <^ 
(0 x) (ID/ELT x). "l^his implication states that if x is a zero then x is an identity element. It 
and many other non-analytic implications arc implicit in the grammar. In order to dispense with 
computing non-analytic tmpdications during induction, Sierra computes all the non-analytic 
implications on the states prior to induction. Thus, whenever (0 6007) occurs in a P^ or a N^, 
{ID/ELT G007) is made to occur as well. {Actually, this falls out automatically from parsing the 
problem state bottom-up using the grammar.) 

Constraints for the sake of ejjlciency 

Although ii is slightly off the main topic, it is worth mentioning that Sierra's implementation 
of version spaces makes some compromises for the sake of efficient computation. Because patterns 
have pattern variables, the LUB and GLB computations are NP-hard. More specifically, if patterns P 
and Q have n variables each, then computing their LUB is 0(n"). These combinatorics reflect the 
usual AI matching problem: Each variable in P can be paired with any vanable in Q. One way to 
deal with it is to use small n. Winston's blocks world inductions rarely used n larger than 5 or 6 
(Winston, 1975). Using small n is impossible in Sierra's case because prediction of the data requires 
pfobltiu biaics with iO lo 30 objects, anu nenvv u\c pdUcrrkS have abuut thai many variables. A 
second solution is to impose prior constraints on which matches will be considered. Sierra uses two 
constraints. 

First, it is assumed that pattern variables have an implicit inequality relationship between 
them. That is. distinct \ariables must match distinct objects- Hence, when finding the LUB of two 
patterns, distinct variables must be mapped to distinct variables. This lowers the combinatorics to 
0(n!). There is actually some empirical evidence for this constraint, but it won't be presented here. 

Second, it is assumed that the part-whole hierarchy established by the grammar is inviolate. 
That IS, if X is paired with x' during the LUB calculation of two pattern P and P\ and 
{ IPart y x) is in pattern P, then y can only be bound to y' when { IPart y' x') is in pattern 
P', This cuts the complexity of the matching down to 0(B!'^ where S is the branching factor 
of the part-whole trees, typically about three or foun When this constraint is turned off in Sierra, 
an lUB computation that normally takes 30 seconds takes well over two hours. A CLB computation 
that normally takes a few minutes takes several days. This constraint, or something like it, is a 
practical necessity. 
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18,2 I'^o hypotheses 

With the btisjcs of \crsiun spjccs presented, the iwo main i^ties of this chapter can be 
discussed- htm to induce fekh patterns, and hiiw to nutch them, '^^u h>pothcscs are advanced 
concerning the kind of inductive bias that determines fetch patterns, 

L Topological bias: I he learner prefers maximally specific geticrahzafions. Hut is, )f <S,G> is 
the version space of fetch patterns, the learner cliooscs from the S set. The name 
**topulogicar' is applied becau;>e the test for inaximality is a simple t^jpological one, (fn, 
particular, when a pattern b \ie\ved as labelled directed graph, one pattern generalises 
another if it is a proper .subgraph of the other, A maximally specific pattern is one that is not 
a subgraph of any larger pattern, llijs definition is tlie one used to maintain the version 
space,) , 

2, Teleological bias. The learner has a set of "teleolugical nuionali/ations" that are used to judge 
the plausibiht) of the \anuus generati/^uoiis that are consistent with the examples. The bias 
is to choose generah/ations that seem to lia\e the ir.ost coherent rationah/^tions with respect 
to what the learner belie\es the purposes uf tiie procedure are. For instance* the notion of 
**boundancs" might be the foundation of a teleological rationalization tliat requires decrements 
to be in the leftmost column. The rationaliAUion goes as follows: The leftmost column is a 
boundary of the grid, Boundanes are often subject to special rules* in general, and especially 
in grid-like arrangements (e,g,, cheeker boards, chess boards. Go boards. Asteroid fields, 
basketball courts, football fields, etc,), Tiie choice of "leftmost*' is rational since the tens 
column of a t\\o-co1umn problem is a boundary case, mierefore, it is especially worth noting. 

The difference between the teleologiual and topological h>potheses is subtle. Neither hypothesis is 
flavvlcss, nor do the daia side convincingl) with one or the other. Ultimately, the choice is made on 
grounds of tailorability in favor of ihc topological hypothesis. 

The second issue discussed in this chapter concerns ^hat kinds of fetch pattern mismatching 
trigger impasses. Patterns can mismatch sc\cra1 ways, A pattern can fail to match exactly, leaving 
some of Its relations unmatched, A pattern can match ambiguously: There may be several ways to 
bmd its variables to problem state objects, and ea^h binding satisfies the whole pattern. Mismatches 
such as these two can be used to trigger impasses. On the other hand, the interpreter ean treat 
them ^ normal events. For instance, the interoreter may not care if a few relations are unmatched 
if its matching convention is to take the match that maximizes set of ri^atchcd relations. This 
issue invokes f:gurmg out a combination of conventions for matching and triggering impasses thai 
will maximize the coverage of the data. Two hypotheses are considered: 

1. Exact match: The interpreter matches fetch patterns exactly. If not ah of the relations of a 
pattern are matched, an impasse occurs, 

2, Closest match: 'ITie interpreter matches fetch patterns closely. That is, the matcher chooses 
bindings for the patt^-n variables that maximizes the set of relations that art matched. If the 
fetch IS ambiguous in that more than one closest match exists, then an impasse occurs. 

The bias issue and the matching issue are discussed together since they turn out to be related, II 
v^ill be shown that the topological bias beems the br^tter account for the learners* bias in inducing 
fetch patterns, Oosest matching is a bettc arcount for the use of fetch patterns during 
interpretation and local problem solving. 
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f8.3 Hic pattern focus hypothesis 

The topological h>p,^*hcsis as it stands is inconnplctc. It needs to say something about how 
large an area to focus on wh^n constructing most specific patterns. The grammar is, in principle, 
po\verful enough to build ^ pai.c tree nut onl> for the subtraction problem, but for the whole page 
containing the problems^ and perhaps even Urger spatial areas. Taken literally, the hypothesis that 
fetch patterns be as specific as possible implies that patterns mention not only the column's position 
in the current problem, but also the problem's position in the current page, the page's position on 
the desk, and so forth. In principle, fetch patterns could be infinite. Including this much context in 
the patterns causes the procedure to impasse whenever the problem is not set in exactly the same 
context as the training exercises. Clearly, this is Ml what people do. A constraint is needed to 
limit the size of the fetch patterns. The constraint wilt be called the pattern focus hypothesis. 

A precise formulation of the pattern focus hypothesis ean be obtained by examining the bug 
Al"Aa>s-Borrow Left. The preceding chapter dexribed its derivation from two-eolumn borrowing 
problems. The discussion centered on the fetch pattern for borrow-ftiom. The fetch pattern for 
borrow-/n/o is equally interesting. *[Tie tbllowing illustrates a typical example problems" states just 
before each of the two aetions: 

4 

a. 6 3 b. 6 3 

-19 -IS 

Problem state a is just prior to the borrow-from act'ton- The learner induces the fetch pattern for 
borrow-from from states like state a. Problem state b is the sort of state used for inducing borrow- 
into's fetch pattern. Tlie preceding chapter showed that borrow-from's fetch pattern is often specific 
enough to cause impasses on Uiree-colunnn problems. ITie essential problem was that the pattern's 
induction from two-column problems led to incorporating tlie fact that the borrow-from column is 
the leftmost column of the problem and it is also adjacent to the column that originated the borrow, 
ff the induction of the fetch pattern for borrow-m/<? is similarly biased, then it too should cause 
impasses because it expects two-column problems. But no such impasses occur. Borrow-into seems 
not to have the overspecific pattern that borrow-from has. This is a key piece of evidence for the 
pattern focus hypothesis. 

Figure 18*1 shows the two fetch patterns in question prior to applying the focus pattern 
hypothesis. Pan-whole trees are used for it-gibiiity. A part-whole tree is a way of displaying 
patterns that emphasizes !Part relationships of the pattern by drawing them as links in a tree. 
Pattern variables are shown as tree nodes. The node label is the variables name concatenated with 
the main categorical relation on that variable. For instance, AC2 is the variable for the whole 
units column, which has two parts. C2 nnatchcs the top part of the column, which contains the two 
digits 3 and 9. A2 matches a blank, the answer place for the units column. A little bit of focusing 
has already been applied in that the two patterns have been pruned to mention only the colunnns of 
the ciirrent problem. With no focusing at all. the patterns would be infinite. The pattern variables 
that are input-output variables are boxed* By ** input-output variable,'* I mean that the variable is 
either 

1. an /npt// variable to the fetch pattern: The variable is an argument of the goal that this 
pattern's rule is under. Thus, this variable will be bound before pattern matching begins. 

2. an ouiput variable of the fetch pattern: The variable is used by the action of the rule that the 
fetch pattern is in. The whole point of the fetch pattern is to get its output variables bounds 



ERLC 



232 



FErciipATmRNS 



229 




A2:BLK 



T2:DIGIT 



B2:DIGIT 



B 




A2:BLK 



B2:DIGIT 



T1:DIGIT 



X1:DIGIT 



Figure 18-1 

Part-whole trees displaying patterns for borrow-from (A) and bonx>w-into (B). 

That is, if a fetch pattern is thought of as a ftjnction for retrieving objects from the problem state, 
then the input-output variables are the lijnetion's inputs and outputs. There are essentially just 
three input-output variables in the Hgure. AC2 is an input variable because it is the argument of 
the goal Borrow in both patterns, and aiese patterns are on .ales beneath the Borrow goal. This 
variable matches the whole units column. In Ti^ure 18-la, the Bonow-^om subgoal is passed the 
output variable Tl as an argument The variable matches the top digit in the tens column in 
problem state a, above. In figure I8-lb, the subgoal Borrow-into is passed the output variable T2 
as an argument It matches the top digit of the units column in problem state b, above. 

The only major difference between the two patterns of figure I8-I is where the input-output 
variables are located. This provides the distinction that is needed to drive the pattern focus 
hypothesis: A fetch pattern has the smallest pan-whole tr. :hat mil span its input-output variables. 
The focused fetcFi pattern for 18-la is the pattern correv^>nding to the tree headed by G. The 
focused fetch pattern for 18-Ib corresponds ro the tree headed by AC2. 
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When the boriov^ into pattern is focused this waj, it mentions onb tlie units column. Hence, 
it not be as specific as the borruw-from pattern, ivhich mentions both columns. In particular, 
the o\erspecifieit> of tlie borrovt from pattern ts absent in the burruiv-into pattern, llie borrow- 
from pattern insists thai the tvtu columns be both adjacent and boundary columns (i,e., leftmost and 
rightmost, respeeti\el>), llie burrow nito pattern duesn*t care about adjacency since it mentions 
only one of the two columns. Whereas the overspccificity of the burruw-frum pattern will cause 
impasse* the burrow-intu pattern is now general enough that it can match without causing <in 
impasse. This explains why there are no bugs that are the equi\alent of AlwaysDorrow l^rt for 
borrow-intu patterns, Ilie focus pattern hypothesis has removed mv/St of the overspecificity of the 
borrow'into pattems, 

ITiis formulatiun of the focus hypothesis makes intuitive sense. The fbcus of attention is no 
larger than the problem soher needs it tu be in order to distrimindte among various bindings of the 
output variables. If the focused pattern of figure 18-lb wer larger* say including the tenseolumn 
as well as the units (i,e„ the focused pattern tree headed by G). then tlic tens coI"mn eould 
potentiall) match several different wajS without having any affect at all on the bindmgs of the units 
column section of the pattern. These matches are superfluous* given that all the solver has to do is 
choose the top digit of the current eolumn* which is the units eolumn. It is pointless to look at the 
tens column in order to figure out how to mateh the units. 

Ii4 Teleologieal rationalizations 

The teleologieal bias is based on applying teleologieal rationali/^ations to choose among the 
various generalizations that induction offcrs. In one sense* some teleologieal rationalizations are 
already embedded in the theory. As mentioned earlier, a teleologieal rationaliz>ation involving 
boundaries (or perhaps comers) would generate the leftmost relationship that is critical to generating 
Always-Borrow Left, This idea, that the ends of sequences arc important is embedded in the 
relationship between the grammar and the pattern relations. The spatial relation First? is added 
to patterns when the appropriate aggregate objects are sequences. However, this convention is just 
a choiee on the theory's part. It doesn't have to be there for logical reasons. The intuitive 
motivation for making it a part of the grammar definition is the same as the teleologieal 
rationalization. First? is a defined pattern relation just because boundaries are of\en important 
and salient for sequential arrangements. There is no similar motivation for Second? — a relation 
true of the second item in a sequence — so it is not a spatial relation. The point here is that 
teleologieal rationalizations that involve spatial relationships are represented in the grammar 
definition. 

This makes the difference between the teleologieal and topological hypotheses quite subtle. It 
essentially means that almost any spatidi teleologieal rationalization the theorist deems necessary can 
be embedded in both systems. However* under the topological hypothesis^ such teleoloc'cal 
rationalizations are installed in the grammar. It c^n be shown that this entails that their effects Wiil 
be felt more widely. 

Under the teleologieal hypotiiesis. the rationalizations re in some kind of knowledge base 
that acts as a filter on eaeh pattern separately. This allows the rationalizations to potentially act 
together, filtering individual pattems in eomplex wayb. On the other hand, under the topological 
hypothesis, F i rs t? must be in all sequence- bearing pattems if it is in any. Under the teleologieal 
hypothesis, its presence can be controlled. In shorts while both hypotheses allow eeriain teleologieal 
rational iz>ations to be captured* the topological hypothesis captures them in a way thai intrcduees 
fcss tailorability into the theory. 
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The same dirference in tailorjbility comcb out clearl> in the cjsc of the bug Borrow-DonV 
Dccrement-Unless-iJoUom-SmdIIcr* l^he bugs dcri\jUun vtiis disLUbbcd in chapter 17* Like Always- 
Borrow Left, it has an overspccific fekh pjtiern> I he lrw^uA extra relationship is 1>R* applied to 
the column to be borrovted from. Ihis rctatujiiship b alwa>b true in two-column borrowing 
problems, tJie kind used in th<? initial burrowing Icsmhi^ but it is not tiuc whenever the problem 
requires adjacent borrows, Mencc* adj<n:cnl borrow pr<4)km^ caUbC impasses, which are what led to 
the inference the w.i:> in the fetch pattern. One ratiunalj/ation for its presence concerns tlie 
fact that the borrow-from goal lias to chi^nsc between the top and bottom digits in deciding where 
to perform the decrement. 1 he ralKmali/alJun ubcb r>li as the wa> to distinguish the two digits on 
the rationali/ittion that **the digit to make nailer is the digit that has more to begin with/* 
Admittedly, tJ'iis is not a particularly strong rati /ation. but it will do to illustrate the difference 
in taitorabihty. The leleolog'cal h>polhesib rational ii.c:> I>B b> using the fact that the paiiern is a 
feteh for a JecremenL If it was fetching for an increment, this partic<ilar rationalization would not 
apply- The rationaUzation is sensing not r.nly the current problem stale, but u is also sensing the 
intended use for the output of the fetch pattern. Clcarl>, this raliona1i/.atjon has acquired a good 
deal of control over whether the pattern will or will not have r>B in it. This extra power is the 
main difFcrcnee between the teleological and topological hypotheses* 

So far, I have yet to see any good use for tJiis extra power. Any fetch that has a ntee 
tcieolt^icat ratio nati/.aiion is adequately derived under the topological h>potJicsis. Consequently, it 
seems wisest to base fetch pattern biasing on topological considerations, Tlie hypothesis that 
captures this is: 

Maximally specific fetch paiiems 

Given that <S. G> is the version space for a fetch pattern. IndueeFeteh chooses any 
pattern in S as the fe;eh pattern* 

S stands for the set of maximally spect/u generali/.ations for a version space, and G stands for the 
set of maximally general generalizations. The set G is not used in the case of fetch patterns* 

The pattern focus hypothesis inieraets with the fetch pattern hypothesis, ft limits the si7.e of 
candidate fetch patterns by restricting the si/.e of their part-whole tree. 

Focus pafiem 

lf<S, ^> is the version space for a fetch pattern* and the fetch pattern is beneath goal G, 
and the fetch pattern will provide arguments to goal SG, a subgoal of G, then let I (for 
input) be the set of pattern variables con csponding to goal G's arguments and 0 (for 
outpul)be tlieset of all variables for providing arguments to SG> The part-whole tree of 
a pattern in S is the minimal tree necessary to span the set lUO. Variables outside this 
part-whole tree are dropped from the pattern, along with all relations that mention them> 

18^ Exact matching vs. Closest matching 

Al distinguishe:> between two kinds of conjunctive pallem matching. Exact matching finds 
bindings for the pattern variables that make all the pattern relations true. If tliere is no such 
binding* then matching fails. The pattern is said to be overconstrained or false. Closcsi matching 
finds bindings that maximize the sublet of pattern relations that tlie bindings make true. Closest 
matching almost never fails. ITiere is almost alwa>s some bindmg that makes at least one pattern 
efcment true. Closest matching actually stands for a clabi> of matching conventions since there is 
some latitude in what it means for a subset of pattern relations to be maximal This section 
discusses whether fctch patterns should be matched clcbcly or exactly. ITiere is a great deal of 
evidence that closest matching is better. 
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Feich pQiierns need generalization long after ihey are acquired 

When induction is biased toward specific p^atcrns and induction is incremental, then it is 
inc\itable that late occurring lessons Vrill need to generalize patterns that ^ere created earlier in the 
lesson sequence. The lesson that creates a pattern v^ill do so using rather constrained* simple 
problems. When latei lessons use more complicated problems, the simple patterns won't match 
exacUy; the> onl> match simple problems. To renovate the older material, the solver must either 
generalize the patterns so that they will now match exactly, or it must habitually use closest 
matching. This entailment will be clarified with an illustration. 

One of the first subtraction lessons in the curriculum teaches how to do two-column problems. 
The students already know how to do single column problems, such as a below; they are taught 
how to do two'Column problems, such as 6: 



This involves inducing several patterns. The pattern of interest the fetch pattern that retrieves the 
tens column. This paitem is overspecific in the same wa> th4l Borrow-from's fetch pattern was 
overspecific in the generation of Always-Borrow-Lefl, It specifies t^.at the retrieved eolunn be both 
tfte leftmost column and lefl-adjaeent column. 

The evidence for closest matching appears when the students are shown their first three- 
column problem. Some students are alrie to induce that main column loop from such examples. 
That is, they are able to install a tail recursion into their procedure. To do so. they must install a 
subprocedure under one of the two currently existing column processing subprocedures. When they 
grt done, the old column processing steps will now be called even on ihtee-column problems. But on 
such problems, the old fetch patterns will not match exactly. The fetch patterns of the old column 
processing subprocedures were tuned for two-column problems. Yet now the students are 
processing three-column problems, presumably without local problem solving. Consequently, one 
must assume either that the lack of an exact match doesn't bother the interpreter (i<e,, closest match 
is the normal interpretation of f^tch patterns) or an old pattern was revised by the three-column 
lesson, llie assimilation conjecture (section 10,1) rules out the latter position. Hence, one is left to 
conclude that closest matching is the usual way to interpret fetch panems. 

There are other arguments for closest matching, but they require describing the inducer in 
more detail than it has so f^r been described. Suffice it to say that closest matching is required 
when the old procedure is used to "parse" the problem state sequence of the worked example 
exercise (sec section 19.1 on skeletons and finding them). 

Ooscst matching is what Sierra uses, f have found no empirical problems with it* The only 
drawback is the inconvenience caused by the fact that it is roughly an order of magnitude slower 
than exact matching (closest matching is an NP hard problem; in fact, it can use the same algorithm 
as the calculation that fmd£ the LUB of two patterns). These considerations motivate the following 
hypothesis: 

Fetch pattern matchfng 

Let 6 be a binding of the fetch pattern's variables to objects of thecunent problem state such that 
all input variables have their eoncct bindings. I^t be the subbct of relations in fetch pattern P 
that are true(satisfiea)by i. Then 6 is a valid match for the fetch pattern only if(l) thercisnoA' 
suchthat/'^C P^'ardP^^ P^s and (2)all the part-whole relations of /'are in /'^ 
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The KiiL cidusc of the hypothesis sayb ihal parl-\vho]e rcldtions c^n*l be relaxed by closest matching. 
ITiJb stipiilatiun Ltnie> much of the combmcttorul camplcxity of the match- It reduces complexity 
from 0(v!) \^here v is the number of pdttern variables in l\ to 0(11!"). where II is the branching 
factor of the part-whole tree and fi is the number of non-leaf nodes in the part-whole tree. 

18.6 When does misiiiatchiiig trigger repair? 

Having dctennmed that closcbt matching is the norma! way to match fetch patterns, the 
question uf triggenng impaSbes can be quickly dispatched. In chapter 17*s discussion of the fetch 
bugs. It wab bhown that sumc kind of matching faihire is triggering local problem solving. This was 
shown by the existence of Noup repairs (i.e.. the bugs Borrnw l^on't- Decrement- Unless- Bottom* 
Smaller and Borrow-Nu Decrement Kxccpt- Last) alternating with Force repairs (i.e., the bugs 
Always-Burruw-Lefl and the compound bug Borrow -Skip-Equal & Burrow -Skip -Top-Smaller). 

One potential candidate fo* Lnggering is that the fetch patterns fail to match exactly. But 
clearly, if closest matching is die norm, this cannot be the impasse condition. 

When the fetch patterns of the fetch bugs are matched closely, it turns out that they each 
match umbiguoubly. Uui is, there is more than one way to bind pattern variables to objects in the 
current problem state. Moreover, in each case, the bindings of the oufpui variables arc ambiguous, 
Kor instance, the pattern matcher might report back that either the tens or the hundreds column 
would be okay for borrowing from. ITiis ambiguity seems to be the reason for calling in the local 
problem solver. To put it intuitively, if the interpreter can't decide between several ways to bind a 
subg<)a] s arguments because the fetch was ambiguous* then an impasse occurs. This discussion 
motivates the following hypothesis; 

Ambiguity impasse 

If a fetch pattern mau;hes ambiguously so that the output variables have two or more 
distinct binding), then an ambiguity impasse is triggered. 

18.7 Summary 

There is a theme which unifies these disparate results concerning fetch patterns. The basie 
problem that fetch patterns solve is disambiguattng which of the many visible objects an action 
should use. Given this charter, it makes sense that it is only when the fetch pattern fails to 
disar.jbiguate an output vanable that the solver decides that something is wrong and therefore some 
local problem solving is called for. This story provides 'ntuiiive tr'^tivation for the ambiguity 
impasse hypothesis. The theme of disambiguation offers explanation for the other hypotheses as 
well. 

In order to maximize the fetch pattern's po-.^^^r to discriminate* the learner remembers 
everything about the lessons examples that might prove uscftjl in fetching. It does so in order that 
problem solving can approximate the lesson situation as closely possible when it chooses bindings 
for the fetch pattern. ITiat is. the learner remembers the most specific fetch patterns possible. This 
motivates the maximally specific fetch patterns hypothesis and the fetch pattern matching 
hypothesis. 

However, the learner leaves behind detail if it can be assured that the omitted infonriiation 
won't help the solver do disambiguation of the output variables. Hence, it leaves behind 
information from "distant" parts of the pattern since such information is only weakly related to 
disambiguating the output variables. This motivates the focus pattern hypothesis* 
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Chapter 19 
Test Patterns and Skeletons 



Several related issues are discussed in this chapter. "Flie first concerns lest patterns* As with 
fetch patterns, inductive learning is usually not sufficient to uniquely specify a test pattern because 
the examples used in lessons are not variegated enough. A typical lesson would leave an unbiased 
inducer to decide among several million tea patterns. This v^ould generate a much wider variety of 
bugs than are observed. Apparently, human learners have some bias in their choice of test patterns. 
One of this chapter's topics is discovering that bias and making it precise. Two hypotheses are 
proposed and contrasted. 

1 The topologicQl hypothesis is that learners choose lest patterns that are the maximally general 
generalizations of the possible cest patterns. That is, if <S»G> is the version space of test 
patterns, the learner chooses from the G set. llie name "topological** is applied because the 
test for maximality is a simple topological one. (In particular, one pattern generalizes another 
if it is a proper subgraph of the other (see section 18.1). A pattern is a maximally general 
pattern if there is no smaller pattern that is a subgraph of it. lliis defmition is tlie one used 
to maintain the version space,) 

2. The teleological hypothesis is that the learner has a set of teleological rationalizations that 
sanction only test patterns that fit into common sense, general notions of what the purposes of 
steps typically are. For the sake of discussion, teleological rationalizations are formalized as 
step schema (Goldstein. 1974), which expresses the general fomri and purpose of archetypical 
steps. For instance, one particularly important step schema is the preparation step schema: it 
rationalizes a step as having the purpose of preparing for some existing "main** step. The 
preparation step schema constrains the choice of test pattern. It might force the test pattern 
to incorporate a precondition for some action in the main step since avoiding a precondition 
violation is one purpose for a preparation step. The learner's teleological rationalizations 



Another issue discussed in this chapter concerns inducing the skeletons of new subprocedurcs. The 
skeleton of a new subprocedure is iis goals and rules* stripped of the goal ai^guments, patterns and 
action arguments. Choosing a skeleton for the new subprocedure is InduceSkeleton's job (see 
section 16J). The choice of skeleton is totally determined by choosing the parent OR goal for the 
subprocedure and the set of subgoals that the subprocedure will call (called the subprocedure's 
kids). Thus^ the issue to be discussed is twofold: (I) under which goal in the existing procedure 
should InduceSkeieton attach the new subprocedure, and (2) which goals should 
InduceSkel 8ton have the new subprocedurf^ call? Induction often leaves several choices open. 

Human learners exhibit distinct biases in their choice of subprocedure skeletons. The two 
hypotheses concerning test pattern bias are extended to cover skeleton bias. The topological bias, 
which has the learner choose maximally general test p^tems, is extended to bias the learner to 
choose a place fyt the subprocedure that will make the new subprocedure be as small as possible. 
Later it will be shown how minimizing the size of a new subprocedure increases i[s generality. The 
teleological hypothesis is that step schemata control the placement of steps. 



about test patterns are represented by a set of step schemata. 
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19.1 The topological bias hypothesis 

To See what skeleton inductiun involves, i cumputor science fixture is needed: the tmce of a 
procedure's execution. A trace is a tree erected over a prublem ^idlQ sequence that shows the 
program's subroutine calling hjstur>. Ilie usual way tu genenite a trace is by placing "print" 
statements just before and just after ^.ubruutine calls. This gener^acs a long, printed listing. A trace 
can also be presented as a tree. Figure 19-1 sluiwb the trace tree fur a currcct subtraction procedure 
(in faeU the one of figure 2-6) solving a BFZ (i.e., borrow from zero) problem. F^ch call is shown 
as a tree node, with its arguments abbreviated. A trace tree is just a parse tree fur the problem 
sute scquenLC. using the procedure as the grammar, 1liis ib a fundamcnul concept in the following 
discussion. Parsing problem state sequences defines s^hich skelctDns exist. Seeing what a skeleton 
IS becomes simple now that trace trees have been intrudueed. If the procedure is missing the BFZ 
goal, then the trace tree would h<*.c a hole in the middle of it, as in figure 19^2, The gap is right ^ 
where the BFZ node would be. From the figure, one can see that a skeleton can be characterized 
b> the link coming into it fr^^m above, and the links leaving it from below. Since a skeleton 
represents where in the par^e tree a new subpri>tedur<; bh'>uld go. it*s no surprise that its defining 
aspects are topological. More formally, a skeleton is uniquely specified by a pair: 

Parent; The name of the OR goal that the new subproccdurc will be attached to. 

Kids; An ordered list of goal names. These will become the actions on the AND goal rules of the 
new subprocedure. Note that only an action's name and not its arguments appear here. 

Almost all problem state sequences, including the example of figure 19-2, admit more than one 
parent-kids pair. Most of the ambiguit> is due to the fact that one can almost always make a 
skeleton bigger. The kids can be lower in the tree; the parent can be higher. Figure 19^3 and 19-4 
show some skeletons for the BFZ prublem state sequence. Figure 19-3 has lower kids. Figure 19-4 
has a higher parent. Any node that would complete an otherwise incomplete trace tree is a 
legitimate skeleton. 

• 

When a skeleton is expressed by <parcnU kids>, then the hypotheses of the theory entail that 
subprocedure placement can be partially solved simply by intersection of these pairs. That is, for 
each example in a lesson, there is a set of possible skeletons; the learner takes the intersection over 
these sets. Because a lesson may introduce just one subprocedure, all the skeletons* parents must be 
the same. Because the new subprocedure Is disjunction- free, each skeleton's list of kids must be 
equal to each other skeleton's list of kids. In particular, two kid lists (A B C) and (A D C) cannot 
be merged by using disjunction on the middle kid to form something like (A (OK B D) C). 

Skeleton intersection is po^^erful enough that it is sometimes possible to devise a lesson that 
yields a unique skeleton when its example's skeletons are intersected. However, for some lessons, 
this IS not possible. In fact, the BFZ skeleton that is our running illustration cannot be uniquely 
specified by examples. The proof relics on the distribution of branches in the aoG that is input to 
the learner The AOG is shown in figure 19-5, ff the examples are designed to exemplify BFZ, 
then they will always allow the skeleton to have BORROVf/FROM as parent. However, because 
BORROW/FROM is the first subgoal of REGROIT, a second Skeleton is always possible. li is a 
skeleton whose parent is 1/BORROW. This skeleton is shown in figure 19-4. It will always be legal 
no matter what the example. So, the intersection of skeletonb over all examples will always have 
both skeletons. 
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Figure 1^1 
Trace tree for solution of a BFZ problem. 
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Figure 19-2 

The skeleton is right where the BFZ node and its daughters were. 
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Figure 19-3 
The kids o f th e skeleton can be lower. 
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Figure 19-4 
The parent of the skeleton can be higher. 
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Figure 19-5 

AOG for a subtraction procedure that doesn't know how to BF21 



Perhaps even more telling than this result is the fact that textbooks often do not vary the 
examples enough that skeleton intersection would converge even when the AOG topology would 
permit such convergence. For instance, itjany textbooks introduce BFZ using only threexolumn 
problems despite the fact that students often itnow how to handle four-column subtraction already* 
When only three-column problems are used, a skeleton whose parent is 1/SUB (see figure 19-5) 
also survives skeleton intersection. That is, the learner can*t tell whether BFZ is a prefix to the 
whole of the subtraction procedu*^, or only a prefix to one column^s processing, ff this skeleton at 
the root level is not somehow filtered out by the theory, then it will survive to predict a star bug 
(the star bug can borrow -from-zero only when the BFZ originates in the units column), fn short, 
something other than skeleton intersection, i.e.> the learners skeleton bia^ is shouldering quite a bit 
of the learning load. 



^ Poteniial generality 

ITie pjoblem of skeleton induction is a general problem for learners of stmctural concepts- 
Iba*s arch learner is a good illustiation of the problem in a familiar domain (lba> 1979), fba*s arch 
learner doesn't know about PRISM, the disjunction of BRICK, WEDGE, and a few other block types. 
Lacking this prefibriuated 'jisjunction, when the learner sees the appropriate examples for inducing 
that the lintel of an arch is a PRISM, it must create a disjunction* The disjunction that fba chooses 
is: 
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a. (OR (AMD (ISA LINTEL 'BRICK) 

(ISA LEG2 'BRICK) 
,(ISA LEGl 'BRICK) 
.(SUPPORTS LEGl LINTEL) 
(SUPPORTS LEG2 LINTEL)) 
(AMD (ISA LINTEL 'WEDGE) 
(ISA LEG2 'BRICK) 
(ISA LEGl 'BRICK) 
(SUPPORTS LEGl LINTEL) 
(SUPPORTS LEG2 LINTEL))) 

lliis concept is just t!ic disjunction of the brick-lintcl arch's description and the wedge-Iintel arch's 
description. Ho^^euT, Iba cuuM have implemented the learner to construct a different disjunction: 

b. (AND (OR (ISA LINTEL 'BRICK) 

(ISA LINTEL 'WEDGE)) 
(ISA LEG2 'BRICK) 
( ISA LEGl 'BRICK) 
(SUPPORTS LEGl LIIJTEL) 
(SUPPORTS LEG2 LINTEL))) 

This concept disjoins only the type of the hntel. It is logically equivalent to the other concept. 
Neither is more general than the other, lliey have exactly the same extensions. However, the 
smaller disjunction, it, is more easily generalized in the future. Suppose Iba wanted to generalize 
the types of the legs from BRICK to PARALLELEPIPED (a solid with two faces that are parallel and 
the same shape, e.g.. cylinders and pnsms). If Iba's examples used cylindrical legs and brick lintels, 
then a and b would become, respectively, concepts c a*- 

c. (OR (AND (ISA LINTEL 'BRICK) 

(ISA LEG2 'PARALLELEPIPED) 

(ISA LEGl 'PARALLELEPIPED) 

(SUPPORTS LEGl LINTEL) 

(SUPPORTS LE62 LINTEL)) 
(AND (ISA LINTEL 'WEDGE) 

(ISA LEG2 'BRICK) 

(ISA LEGl 'BRICK) 

(SUPPORTS LEGl LINTEL) 

(SUPPORTS LEG2 LINTEL))) 

d. (AND (OR (ISA LIKTEL 'BRICK) 

(ISA LINTEL 'WEDGE)) 
(ISA LEG2 'PARALLELEPIPED) 
(ISA LEGl 'PARALLELEPIPED) 
(SUPPORTS LEGl LINTEL) 
(SUPPORTS LI.62 LINTEL))) 

In concept c, only the first disjunct has been generalized. The examples used only brick lintels, so 
there is no evidence that the wedge-lintel disju.ict should be generalized. Consequently, concept c 
will not match an arch with cylindrical legs and a wedge lintel, but concept J will. The d 
conception of the arch is more general. Moreover, its generality is due only to the placement of the 
earlier disjunctton, baausc the same inductive biases and the same examples were used to gpnerate c 
and d from thdr predecessors. To coin a phrase, concept b has more potential generdity than 
concept a. Given two concepts that are equivalent in generality, one is a potential generalization of 
the other if there exists an inductive algoritlim and a 5^,1 of training examples that makes the first 
concept more general than the second concept. 
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Biases for subprocedure placement 

It is not bard to sec tliat in an And-Or langu^igc, such as Iba's representation of arches or 
Sierra's aOGs, thoosing a smaller d'hjunction gives the resulting concept greater potential generality, 
llic precise definition of "smatlcr'' depends on the representatitm language. The skeleton of figure 
IM yields a procedure with greater potential generaiity than tlie hkelcton of figure 19-4. Iliesc 
cunhideratiuns of potential generalitj motivate the hypotheses which define the maximal gencrahty 
bias for subprocedurc placement: 

Lowest parent 

Given two subproccdures, A and for possible addition to a procedure Pjf A is lower 
than Bin that there is a path from Fs root to A that passes through B, then 
InduceSkeleton chooses A, 

fewest kids 

If two subprocedures, A and B, have the same parent OR, and A has fewer kids 
than Bahen IntluceSkeleton chooses A. 

As the name "lowest parent" indicates, the first hypothesis biases the learner to choose skeletons 
whose parents arc low in the parse tree. This hypothesis chooses the skeleton of figure 19-2 over 
the skeleton of figure 19-4 because BORROW/FROM is lower than l/BORROW. 

The fcwest kids hypothesis biases the learner to choose actions for the new subprocedure that 
make maximal use of the old procedure. For illustration, compare the kids of the skeleton of figure 
19-2 with the kids of the skeleton in figure 19-3. The latter skeleton is also a possible parse of the 
BFZ examples problem state sequence. It has three kids. Note tliat it docs not have PEGROUP as 
a kid. lliis means that the new subprocedurc constmctod from this subprocedure will not be 
recursive. It won*t be able to borrow across multiple /.eros. The skeleton of 19-2 can* Hence, the 
skeleton of 19-2 is alread> more general than the skeleton of 19-4. 'lliis illustrates how the fewest 
kids hypothesis implements a bias towards maximal generality. 

The K*vcst kids hypothesis often leaves an important choice unmade. In an AOG, every AND 
has an OK just above iL Hence, whenever an AND occurs as a kid in a skeleton, it will always be 
possible to use the OR instead. For instance, the skeleton of figure 19-2 has REGROUP and 
OVRWRT as kids. It could equally well have the OR nodes just above them as kids. In fact^ there 
are four possible kids; 

I l/BOPROW BORROW/FROM 

2. REGROUP BORROW/FROM 

3. l/BORROW OVRWRT 

4. REGROUP OVRWRT 

All these choices arc legal with respect to the lowest parent hypothesis and the fewest kids 
hypothesis. 

The maximal generality bias would advise taking the highest nodes, namely choice 1. 
Surprisingly* this is not what students do* lliis particular procedure does not offer a good example^ 
but the character of the evidence can be indicated Suppose BORROW/FROM is chosen uS the 
second kid because it is higher than OVRWRT* lliis choice makes BFZ recursive, because the 
skeleton parent is also BORROW/FROM, (Actually, because this particular procedure h s REGROUP, 
BFZ would be recursive anyway. Some core procedures lack REGROUP, because they are taught 
with lesson sequences that do not have special regrouping lessons, such as lesson sequence HB, 
which was discussed in chapter 2, For such core procedures, BFZ will only be recursive after this 
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singlc-/cro BFZ lesson if BORROW/FrvOM is chosen as Ihc ^ .ond kid ) Thus, after a ic^ison on BF2 
that uses only single zero problems, such as a, 

29 299 29 9 

a, 3^0*6 b, 3^0^0*2 c, 3^oWz d, 3 o'o'*2 

- 1 2 8 - 1 2 3 8 - 1 2 3 8 - 1 2 3 8 

177 1767 1867 7 

Tie student can do muUi-7.ero IIFZ, as shown in b Howe\er, there are several bugs thJt indicate 
students do not always achieve such competence from the single-zero lesson. One bug*s work is 
shown in c. It does not know how to borrow from multple zeros. Hence, when it does the 
problem, it winds up trying to decrement the zero in the hundreds column, 'Iliis violates a 
precondition: zero canrxOt be decremented, Ilie solver impasses and repairb with Noop- At the end 
of the units column, the problem looks like 'Iliis bug is called Stops- Borrow ^t- Multiple- Zero, 
There are a few other bugs like it resulting from different repairs to the same impasse, llie 
existence of these bugii shows that not all students acquiie a recursive BFZ from single-zero BFZ 
lessons. In order to generate these bugs, the maximal generality bias cannot be used to resolve the 
four-way kids choice * <en*ioned above. There are similar illustrations involving Jic main eolumn 
traversal loop (i,e,, if a maximally general kid is chosen, then the learner acquires the loop too 
soon), 'Iliese facts motivate the following hypothesis: 

Lowest kids 

If two subproccdures, A and B, have the same parent OR and the same number of 
kids, and each of Vs kids is lower than or equal to the corresponding kid in D, 
then InduceSkeleton chooses A, where 'lower*' is defined as in the lowest 
pa rent hypothesis. 

Given the four-way choice mentioned above, this hypothesis has the learner pick choice 4 as the 
skeleton's kids. 

It is not entirely clear why this hypothesis exists^ given the learner's general bias toward 
maximal generality, U leems that the maximal generality bias applies only to the choice of when 
new subprocedures are executed and not to what the new subprocedure's subgoals will be. The 
choice of subgoals (= kids) is perhaps governed by the same gene^-al bias as the choice of fetch 
pattern. Fetch patterns are chosen to maximize their specificity* The lowest kid hypothesis also 
maximizes specificity. 



Test pattern bias 

Having discussed the biases relevant to skeletons, it is time to turn to the other half of 
inducing the "when" part of new subprocedures: test pattern induction. The maximal generality 
bias is easily formalized to apply to test patterns: 

Maximally general teat patterns 

Given that <S,G> is the version space for a test pattern. InduceTes t chooses a g€G as 
the test p.'^Ttem, 

19,2 Stc-p Sehemata 

There is a completely different approach to choosing skeletons and test patterns. It postu'dics 
teleologiuil knowledge of the studeat Instead of using generality as the cri:erion, the learner selects 
only those subprocedures that can be recognized as instances of general teleological rationalisations. 
For the sake of discussion, these rationalizations will be taken to be step schemata (Goldstein, 1974), 
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Several step schcmjLj will be presented before distus&ing the overall quality of the approach. 



Three step schemata 

Perhaps the mu^t impurunt stq) schema is the preparatory- step sehema. It matches skeletons 
th.it mscn some material just before an existing action, presumably to insure that the action^s 
precuiulitiuns arc met, Tcchnicatb, the recognition pattern for the preparatory step schemata 
nierel) checks tlial the last kid in the skeleton s kid list is a goal that will wind up being a sister to 
die goal of the new subprocedure. This is easiei to understand with the aid of figure 19-6. The 
orepkirjtory step schema applies only if the new subprocedure prepares for an existing action. Thus* 
if 19'6a IS a fragment of the learner's input aOG> tlien the new subprocedure will have to have 
either Actl or Act2 as its last kid. In the resulting procedure (19-6b), the new and goal (New) 
has a feu preparatory actions (X and Y). then the action being prepared for (Act2), Borrov^ing 
could be acquired as a preparatory subprocedure for the main column processing operation. Let 
DifF abbreviate iliat operation, which is (Write A (Sub (Read T) (Read 6))). So Borrow is a 
preparator> subprocedure for DifF. Similarly. BF'Z could be acquired with a preparatory step 
schema. In both <-jses. the teleology of preparation is semantically correct. This is not in general 
true uf the application of step schemata. Sometimes the purpose attributed to the acquired 
subprocedure onlj seems correct to the student, when in fact the correct purpose is completely 
different 

The cleanup Jep schema is the dual of the preparatory step schema, ft matches skeletons that 
insert maierial just after an existing action. Goldstein (1974) calls the preparatory step and the 
cleanup step schemata interface steps since they take car'! of the details of meshing a main step into 
a sequence of other steps. 




(a) (b) 
Figure 19-6 

Fragments of nil aog before and after preparatory step schema has applied. 
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Figure 19-7 

Kragmcnls of an AOG before and after loop step schema has applied. 
"ITie boxed AND node indicates a recursive call. 



Another step schema is the loop schema It detects skeletons that are intended to introduce 
luup^. lib rctognition cunditiun is somewhat more complicated. The skeleton's kid list must be two 
kids Inng. Both kids must be tJic same action, and moreover, that action must match a sister of the 
skeleton parent- Furthermore, the parent must be called from an AND that also has an instance of 
the goal Figure 19-7 illustrates this. Essentially, the schema converts a procedure that can handle 
a smgle or double occ'jrrence of Actl into a recursive procedure that handles arbitrarily long 
sequences of ActTs. It does this vthen it detects a triple occurrence of ActL The loop schetna 
replaces one of the skeleton's kids with the right goal to create a tail recursion. 

ft IS important to note that the loop schema can be configured differently. B> leaving off part 
of its recognition condition^ one can have it detect loops when onl> two occurrences of Actl are in 
the example. This predicts that a student who is shown two-column subtraction for the first time 
LoulJ infer the tatl recursion needed fur multicolumn subtraction. "ITiis prediction seems too strong 
to mc, although 1 have no evidence against it. For what it's worth, one young student solemnly told 
a colleague, "It takes three to see a pattern, ff there's only two, you don't have a pattern/* The 
vari^ibility in the definition of the loop schema illustrates that step schemata can increase the 
tailorabilit) of Oie theory, ff the student had said, "It takes four to see a pattern/' one could easily 
construct the corresponding loop schema 



Constrams on test patiems 

More subtly, the meaning of some schemata, such as the preparatory step schema, entail some 
:onstrainls on the patterns thai are constructed to fill out the skeleton, 'Hie preparatory step 
schema seems intuitively to involve the nouon of s<itisfying some precondition of the action being 
prepared for 'l^hj^ would entail that some prccondttjun of tlic action ought to be a part of the test 
pattern of the new subprocedurc. l^hus, since Y<B js a precondition of Diff, KB ought to be part 
of the rule that pushes for borrowing. 
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In fact, step scijcmau tuusi have sucli constraints on test p<ntcrns. If lhc> don't, then t)iey 
admif some empiricatl> lldv^cd skeletons. In fact, thc> admtt all the skeletons Ui^it the lowest parent 
hypothecs rjIcs uiiL He cxaminatiun of figure;^ 19-3 and 19-4 sliov^s that buth these skeletons 
meet the topijiogical requirements of the prepjrator> step schema, in order to filter out the 
skcletUEis that lead t<; unutteer\ed bu^s, the prcparalijr> step schema niu^t citlier adopt the lov^est 
parent h>r Jthesi^, blatantl> vit^lating Occam's razor, or it mu^t u^e test patterns and preeonditions 
in filtering the larger skejetons. 

An experiment with step schemata 

Ilie three step schemat*i mentioned abo\e ^crc implemented, and j couple of subtraction 
curricula v^cre mn through Sierra. Although not the most extensi\e test in tlie v^orld this exercise 
indicated enough seritrus fla^^s in the step schema framework to v^arrant its rejection. 

Before discu^^iiig those flavts. it is v^orth mentioning that this was not an expected or a 
v^elcome result. I had fulj> expected the step schemata to suffice for skeleton filtering. It seemed 
mtuitivel) obvious at the time that skeleton acquisition ought to be constrained b> the sttident*s 
general knowledge of about prtKredurcs. Indeed^ there v^erc a number of talks where 1 sketched the 
notion of teleological rationah/^ticns m glowing terms (step thciir) was given its name back in those 
days)* 'lliere was a gr^nd research programme waiting in the wings. 'Ilie study of learning, as it 
occurs m current classe:>. was seen as paving tlie wa> for improved curricula: In the present, 
descriptive stud>, the step schemata of the naturally occurring teleology would be uncovered and 
foimali/ed. Ilien the procedures and the curricula could be overhauled to conform to ihe natural 
teleology. However, the present study has uncovered no traces of a rationalisation based teleology. 
At this time, it appears that rhe data eonform best to a simpler model, the topological one. 

Step schemcia block bugs 

One problem with step schemat^i is that the learning they predict is too good to be true* 
When constructed to reflecl intuitively sahent teleology, they prevent the learner from acquiring 
scleral observed bugs. For instance, suppose the preparatory step schema constrain?* test patterns to 
check tne preconditions of the action being prepared for. Ilie rationaliza'.ion for this is that if lltc 
precondition is sausficd. there is no need to prepare, if it's f:*5:>e. the newly acquired subprO\-edure 
should be executed. In subtraction, both borrowing and BFZ are instances of this preparatory step 
schema. If the precondition checking teleology is built into the schema, then seve/ul borrowing and 
liF7* bugs cannot be generated by subprocedure acquisition {e.g.. N-N Causcs-Borrow, Borrow^ 
1 reat-One-As Zero). To regain the generation of these bugs* one ^^ould have to elaborate the 
schema's constraints o; dispense with them altogether. 

in a larger viewi this flaw takes on added importance. Fixing precondition violations with 
preparatory steps has been a fixture in every study of teleology that 1 know of* If tlijs rudimentary 
notion Cannot be susuined undamaged by the data, fheti prospects look dim for fcleolqgical 
rationali/iitions as a whole. 

Step schemata and tailorability 

The worst problem wiln using step schemata is that the thei)rist must either fix the set of step 
schemata once and for all or live with a highly tailorable model. The tailorability makes the theory 
difficult to refiJte, As a case in point, 1 once thought there might be a schema for the notion of 
preprocessing* In particular, it seemed that borrowing could be construed as a preprocessing step for 
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the multi-column traversal. Using this schema, the learner ^^ould acquire a procedure with two 
loops, llie first loop would make all the columns easy b> borro^^ing whcne\er necessary. The 
second loop would move across the columns again, taking the column difTerencc, llie 
preprtKCSsing version of subtraction makes teleological sense, and generates a few unique bug 
predictions. However, none of these predictions have been verified by the data, Docs this mean 
that the ^^hote step schemata approach is wrong, or docs tt mean only that there is no preprocessing 
schema? Iliere is no way to know without collating more data. Not knowing whether the theory 
is really right is die price one sometimes pays for having a highly tailorablc theory. 

In short, the teleological bias will be rejected partly because one particular version of it docs 
^^orse than the topological bias at predicting bugs, and partly because it is loo difficult to reftjtc in 
general. 

193 Summary 

Test pattern induction is one of the most critical issues in the theory. Several bugs depend 
directly upon the bitiscs u^d in their acquisition. Nonetheless, a has proved a vef> tricky issue to 
discover what those biases are. Two positions were considered. One ^^as topological in character. 
Given a \ersion space of all patterns that are consistent with the It x^*rs positive and negative 
instances, the topological bias has the learner keep a maximally general pattern as the test pattern. 
The other position is based on teleological rationalizations, cast as step schemata, 'lliesc schemata 
act as filters on the possible generalizations of the examples offered by the uthcrwisc unbiased 
inducer. The prototypical step schema is the preparatory step schema. It sanctions subprocedurcs 
that can be construed as preparing for some already existing step in the procedure. 

The difference between the t^^o kinds of bias arc subtle despite the fact that the positions they 
represent view learning in radically different ways. The topological view is basically an empiricist 
viewpoint while the teleological viewpoint is strongly nativist. Nonetheless, the data does not argue 
strongly for one over the other. The issue is settled mustly on grounds of tailorabiliiy. Under one 
reasonable interpretation of the teleological bias, several bugs could not be generated. The bugs 
could uot be gcneiatcd because the teleology of the preparatory step schema matches the correct 
teleology for subtraction rather closely. Hence, it prefers correct subtraction procedures over the 
buggy ones. It cannot^ therefore, generate certain bugs that the topological bias can generate. 
However, there is nothing sacred about the particular schemata that were used. If they were 
replaced by less stringent ones, then perhaps the bugs could be generated. This points out just how 
much conrr"' the theorist has over the predictions of the theory when step schemata are used. It is 
this tailorability that led tc the downfall of the teleological approach. 

Ilie chapter discussed a second kind of learning that is very strongly associated with test 
pattern tnduciion, it is the choice of Vjcclcl^n for the new subproccdure, Ilie skclctotj expresses 
the attachment point of the new subpioccdurc to the procedure's current goal hierarchy. 
Essentially, skeleton induction is the '^ontrol structure a.iaIog tc pattern inuuclion. It decides 
'when * in Ihe control environment to execute the new subprocedure: test pattern induction decides 
" hen" in the external environment to cxaute it. The two biases just discussed* the topological 
bia,; and the teleological bias, extend to skeleton induction. Each becomes a little more 
complicated, since the topology of the control structure is a little more complicated than the 
topology of lest patterns (i,e„ trees versus sets), However, the same basic results are found for 
skcicton induction. Both biases work, but Jhe topological bias is less tailorablc. 
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The bus level, chapters 17 to 19, uncovered and formali/ed the learner*s biases concerning 
pjtUrn nduiwlLOn and skeleton induction. It began by liilfitling a promise made earlier to show that 
test patterns and Teteh patterns are actually distinct. This yielded the Tollowing hypothesis: 

Two patterns 

The representation uses different patterns for testing rule applicability and for fbcus 
shifting: Test patterns are used in choosing which OR rule to execute. Ketch patterns are 
used to shift the focus of attention (data flow). 

"Iliis distinction was needed by the representation level in order to define the procedure 
representation language. 

From the standpomt of learning* the representation language defines the f^rm of patterns and 
subprocedurc skeletuns. Inductions job is to find all possible ways to fill in their forms in such a 
^^a> that the> are consistent with the examples. Because the representational hypotheses are so 
Constraining* it is tcchnicjlly fe^ible to generate all these possibilities and check some of the 
resulting predictions. It v^as shovvn that pura unbiased induction overgenerates. It acquires 
skeletons and patterns that human learners do noL Apparently, students have some biases. 

Topological biases vs. teleological biases 

The bias le\el contrasted two kinds of bias. Topological biases (which are ultimately shown to 
be the better bjases) arc based on maximizing ur minimizing generality. They are "knowledge free" 
in that the biases can be computed directly from the topology of patterns and skeletons. 
Teleological btases postulate a knowledge base that contains teleological rationalizations. A 
teleological rationalization invents a plausible purpose for a new subprocedure. The bias is to 
accept only subprocedurcs that appear to have some purpose. For the sake of discussion, 
teleological rationalizations are represented as step schemata {Goldstein, 1974). The prototypical 
step schemata is the prcparator> step schema. It sanctions subprocedures that can be construed as 
prepanng fur some *ilrcad> existing step. For instance, borrowing might be rationalised (correctly, 
in this case) as preparation for^takir^ the column dirTerence. 

The evidence docs not clearly favor topological biases over the teleological ones, except in the 
case of fetch patterns. For fetch patterns* the grammar serves as a repository for teleological 
rationalizations. Given Uiat the grammar has, for instance, the teleological notion that boundary 
conditions arc important, topological bi.5ses for fetch patterns yield high quality predictions. In the 
case of test patterns and skeletons, more than the grammar is needed to bias their Induction. 
Toloulogjual rationalizations would ser\e adequately, it seems. Ho>*ever, having a knowledge base 
of step schemau introduces a great deal of tailorability into the theory. Topological biases 
introduce no tailorability alL >ct they seem as goi)d or better than the teleological biases in their 
empirical predictions, llie remainder of this chapter discusses the topological biases. 

Ilie topological biases f<ictor the bias issues along the lines of a subprocedure's components, 
llicrc are four compimcnts of interest: (1) the test pattern on the adjoining rule, (2) ihe fetch 
patterns on the nc^ AND*s rules* (3) the subprocedures parent OR* and (4) the &ubprocedure*s kids. 
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'll\Q parent CR is the or goal that the new subproccdurc will be underneath.'^ The kids are goals 
that tlie ne* subprucedure ^^ill call- i^o biases concerning these four components will be discussed 
in turn. 

Topological biases for "when" 

The test pattern and the parent OR are related in that both concern when the new 
subprocedurc may be executed. The test pattern expresses externa! conditions. It ftjnctions as a 
predicate on problem states. The parent OR expresses internal conditions. It ftjnctions as a 
predicate on the top of the goal slack. Only if the parent OR goal is on the tup of the stack may 
the procedure execute the new subprocedure. It is somewhat surprising that both aspects of "when" 
— the parent OR and the test pattern — are subject to the same topological bias. In both cases, the 
learner prefers maximizing generality. First test pattern bias will be discussed* then parent OR bias. 

The range of test patterns that the model's inducer picks from is represented as a version 
space, <S, G>* where G is the set of maximally general patterns and S is the set of maximally 
specific patterns. As examples are fed to the inducer, these two sets creep toward each other. The 
maximally general patterns in G become more specific. The maximally specific patterns in S 
become more general. Induction could be unbiased if it al*a>i h^ippened that the> came together 
(G-S), In this case, their contents would be the only patterns consistent with the examples^ and 
bias would be superfluous. Given actual lessons, S and G never come dose, A typical S pattern 
has a hundred relations; a typical G pattern has a half-dozen relations. The inducer may choose 
an> pattern that is between the two sets G and S (technically, any pattern that is a subgraph of 
some i€S and a supergraph of some g€G). The learner has on the order of 2*** possible patterns 
to choose among. The bias is to choose only patterns in G: 

Maximally general test patietm 

Given that <S,G> is the version space for a test pattern* 

Inducelest chooses an g€G as the test pattern. 

Maximal generality is also the bias for choosing parent ORs, However, it is generality of a different 
kind. Skeleton induction first locates all possible parent ORs. That is. any of the parent or's would 
lead ultimatel> to a new procedure that is consistent with all the examples. These parent ORs are all 
logically equivalent, in a sense. No choice of parent OR is more general than the others. However, 
choosing a parent OR that is low in the aog (i.e., far from the root) causes ftjture subprocedure 
acquisitions to create more general procedures. This is a new conception of generality, so it 
deserves a moment's discussion. Suppose one wc''e adjoining disjunctions to And-Or propositions, 
sliirting with (AND A B). If the inducer is shown CB as an example, the following logically 
equivalent expressions would both be consistent generalization of the example: 

a. (AND (DR A C) B) b. (DR (AND A B) (AND C B)) 

Expression a represents adjoining a subprocedure C to a low parent, b represents adjoining to a 
higher parent. When the inducer is sho^n AO, causing a second subprocedure D to be adjoined, the 
two expressions produce expressions that are not logically equivalent: 

c, (AND (OR A C) (OR B D)) d. (OR (AND A (OR B D)) (AND C B)) 

Whereas c is true of all four of {AB, AD, CB, CD}, expression d is true of just the first three, 
Kxpression c has more* generality, yet the cause of its generality is the initial attachment of C, Hie 
luw attachment of C gave expression a greater potential generality. Given two logically equivalent 
expressions, one expression has more potential generality if there exists an induction algonthm and 
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a sequence of examples that makds the first expression more general than the second. 

The learner's bias for parent ORs is to chose the skeleton that maximizes potential generality. 
It chooses the lowest parent OR. The hypothesis says exactly this, albeit in a somewhat tcchnieal 
way. 

Lowest parent 

Given twosubprocedures, A and B, for possible addition to a procedure P, if Ms hwer 
than B in that there is a path from P*s root to A that passes through B, then 
InduceSkeleton chooses Af' 

Topological biases for *'whai** 

We just saw that the biases for when to exeeute a subprocedure are based on maximizing 
generality. The other two components, fetch patterns and kids, are subject to the opposite bias. 
The learner chooses them in order to maximize spectfi:tiy. The fetch pattern describes where the 
subprocedure's action takes place. The kid describes what tha action is. Essentially, the two 
together establish what the subprocedure does* Viewed this way, it makes sense that they should 
have the same bias. 

The range of choices for a fetch pattern is a version spaee, just as with test patterns. The 
leamer*s bias is to take a maximally specific pattern: 

Maximally specific feich patterns 

Given that<S,G> is the version space for a fetch pattern, 

InduceFetch chooses any pattern inS as the fetch pattern. 

However* unlike maximal generality, maximal specificity is essentially unbounded. Patterns can get 
infinitely large. If one pattern describes only a column and the other pattern describes the same 
column in the context of a problem, then the second pattern is more specific. A pattern that 
describes the same column and problem in the context of a page of problems would be an even 
more specific pattern. To put a limit on this boundless specificity, some maximal size is needed 
Based on bug evidence* the following hypothesis seems to be the appropriate one: 

Focus pattern 

If <S, *>> IS the version space for a fetch pattern, and the fetch pattern is beneath goal 
and the fetch pattern will provide arguments to goal SG.asubgoalof G, then let I (for 
input) be the set of pattern variables eorresponding to goal G*s arguments and 0 (for 
output) be the set ofall variables used for providing arguments to SG. The part-whole 
u-ee of a pattern inS is the minimal tree necessary to span the set lUO, Variables 
outside this part-whole U'eeare dropped from the pattern, along with all relations that 
mention them.^ 

Essentially, the fetch patterns are chosen to be the maximally specific patterns that might prove 
uscftjl in disambiguating fetches. ITie focus pattern hypothesis asserts that parts of the part-whole 
tree that the input-output variables do not reside in are too distant to be uselijl in disambiguating 
various ways to match the output variables. 

The other half of the "what** bias concerns choosing kids for a skeleton subprocedure. As 
with choosing a parent, unbiased induction leaves several choices open. Empirical evidence 
motivates the following biases: 
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Fewest kids 

irtwo subproccdurcs. A and B.havc the same parent OR. and A lias Tewerkids 
ihan B, then InduceSkeleton chooses A, 

Lowest kids 

[Ttwo subprocedures« A and )}*havc the s^ime parent OR and the Same number oT 
kids, and each of A*skids is lower than or equal to iIk a>rreSponding kid in B» 
then I nduceSkele ton chooses A* where "lower" is defined as in tlie lowest 
parent hypothesis. 

Tlie lowest kids h>pothcsis is an instance of the bias toward maximal spccirjcit>. As with the parent 
OK choice^ Jt is potenual specificity rather than current specificity that is being maximized. 'Hie 
lowest kids hypothesis maximizes potential specificity. 1lie other hypothesis^ fe\^est k\db, is a bias 
to^^ard generality. Surprisingl>« it has precedence over the bus towaids specificity, the lowest kids 
hypothesis. The data clearly require that it have precedence — it is impossible to learn tail 
recursive loops if the precedence is reversed. Why this is, is still a mystery. 

The learners* topologic**! biases fall into a coherent pattern. The fbtlo\ving table illustrates it; 





When 


What 




Max. generality 


Max. spccifieity 


Skeleton 


Parent OR 


Kids 


Patterns 


Test pattei n 


Feteh patterns 



The biases for when to execute the new subprocedure are in favor of maximizing generality, llie 
learners prefer to use the new subprocedure as mueh as possible, lliey choose a parent OR and a 
test pattern that will cause maximal usage of the new subprocedure. This is echoed in the confliet 
resolution pnnciple that stipulates that the most recently acquired rule is preferred vvhenever more 
than one rule' is applicable and has a true test pattern (see section 10.5). So* all principles fit into a 
pieture of students who tend to exercise their new subprocedures as mueh as possible. 

The biases concerning what the subprocedure should do go the opposite direction, fn the case 
of feteh patterns, there is a clear preference for maximally specific patterns. In the case of the 
skeleton*s kids, there is a somewhat mixed prcferenee for maximizing potential specificity. 

Matching 

\ 

The bias toward maximal specifieity of fcteh patterns makes sense. The basic job that a fetch 
pattern does is disambiguate which of the many visible objects in the problem sutc the procedure 
should shift Its fbcus to. The best way to do that is to remember everything about the exemplary 
fetches that might be even remotely useful (i.e,, choose a maximally specific fcch pattern). During 
problem solving, this highly specific description will often not apply exactly, but that is okay: the 
procedure takes the closest match to the fetch pattern. That is, solver views the current problem 
state in such a way that it approximates the lesson situation as closely as possible. Thus> it 
minunizes the nsk of fetching the wrong objects. This way of using fetch patterns is captured in 
two hypotheses: 
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Fetch pattern matching 

Let 6bc a binding of the fetch pattern's variables to objcctsof the current problem stale 
such that all input variables have their correct bindings. Lei be the subset of relations 
in fetch pattern Pthat are true (satisMed) by Then 6 is a valid match for the fetch 
pattern only if(l) there is no i' such that C P^'and^^ P^' , and (2) all the part- 
whole relations of P are in P^. 

Ambiguity impasse 

If a fetch pattern matchesambiguously so that the output variables have two or more 
distinct bindings, then an ambiguity impasse is triggered. 

The second hypiithesjs reflects the view that disambiguation is the main job of a fetch paliem* If it 
fails, then local problem solving wiU have to lake oven 

Test patterns, on the other hand, clearly have a different role. They must report true or false. 
*Vho onl> wj> to achieve this fiinction (and slill generate certain critical bugs, the fetch bugs) is to 
use exact matching for test pauerns; 

Test pattern match 

A tebi pattern is considered lo be true if and only if it matches exactly (i.e.. all its relations 
are true in the current problem stale). 

Function biases 

The representation level presented a ncarl> complete formalization of the learner and the 
solver, li lefl six fiinctions undefined: 

J^feletonF i1ter 

IncfuceTest 

InduceFetch 

InduceFunc t10ns 

Test 

Fetch 

The hypotheses developed in bias level defined all of these functions except fbr 
InduceFunc t ions. That bias has not been discussed yet. The evidence is so clear and intuitively 
comp<^lling that it is unnecessary to devote a whole chapter to it. The basic idea can be illustrated 
with the usual example, learning the BFZ lesson, ff the core procedure does not have REGROUP in 
it (which occurs when the procedure is learned Trom a lesson sequence that does not have a special 
regrouping lesson), then there are three kids for the new subprocedure: (1) decrementing the 
hundreds column. (2) adding ten to the tens column, and {^) decrementing the tens column to nine. 
Each of these kids is a call to the OVRWRT goal passing it a location and a number. To generate 
the number, each of the three kids has a (unction nest, ITie following are the choices for each nest 
that are inductively valid: 
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Decrementing the hundreds: l>;crementing tlie lens eolumn 

1. (Sub (Read ARGl) (One)) 1, (Sub (Read NVl)(One)) 

2. (Subl (Read ARGl)) 2. (Sub (Read NVl)(Read R47)) 

3. (Subl (Read NVl)) 
Adding ten to the tens; ; 4. (QUOTE 9) 

1. (Concat (One)(Zero)) 

2. (Concat (One)(Read NVl)) 

3. (QUOTE 10) 

The variables. ARGl- NVl and R47, come from various fetch patterns. U doesn*t mailer nhal they 
mean. ITie point is on]> that eaeh nest is inductively vahd in that they are consistent with all 
possible examples. Onl> a bidS of some kind can be used to eliminate them. It caa be shown thai 
only the last nest of eaeh of the sets of nests above is empirically corrccL 

For tile first kid. the choice is between Sub and Subl. Suppose the inducer chooses Sub. 
This means that a precondition violation will occur whenever ARGl is zero, as it will be when the 
procedure is applied to soke problems that require borrowing across multiple zeros. In this case, 
one of the obsened repairs is Backup* which, as in Ihe bug Smaller From Lai^ger-Instead-of Borrow- 
Hrom Zero (see sections 9.1 oi A9.1). causes a Secondary impasse and further local problem solving. 
Ilie secondary impasse also involves Sub. but in the context of answering a column. It is the 
familur KB impashc. ITiis secondary impasse is soKed by a different rej^air than Backup* So the 
same precondition \iolatjon, trying lo Sub a larger number from a smaller number, is repaired iwo 
different ways. 'ITie observed bug is stable. It does this dual repair consistently. Although the 
theory ailo\^b such flipping back and forth between repairs, il is clear that the subility of the bug is 
better modelled if the decrement-zero impasse is a different impasse than the T<B impasse. This 
falls out naturally if Subl is chosen instead of Sub. ITiis means tliat the deerement /iero impasse 
will be a violation of SubTs precondition, and the T<B impasse will be a violaUon of Sub's 
precondition. ITie two impasses arc formally distincL A different patch for each is quite plausible, 
and cleanl> accounts for the observed bug. Apparently, learners are biased to chose Subl over Sub 
in the case of ihe firsi siibprxedure kid. 

In the cabc of the third subprocedure kid, the evidence is stronger bul more complicated. The 
basic finding is that when the rule deletion operator removes the second rule of the new 
subprocedure. the thjrd rule is called to decrement a zero, which would normally be a len. This 
would generate an impasse. However, no bugs corresponding to this impasse have been found. 
Consequently, none of the Sub or Subl nests are in use, since they would alt generate impasses. 
The last nesi which is just the constant 9, is evidently the nest thai learners prefer. 

These fact.s arigue that the biases of learners with respect to function induction is simply to 
chose the function nest v^iih the fewest argument places, where a constant counts as having no 
argument places. ITiis is captured in the following hypothesis: 

Smallest ariiy 

If twosubprocedures. A and R.are idenlical excepl for a function nest, and theanty of 
A^ nest IS smaller than theanty of IVs nest, then InduceFunct ions prefers A, where 
the arity of a function nest is the sum of the number of argument places in its functions 
(i.e.. constants andnullary functionscOunlO. unary flinctions count 1. binary functions 
count 2. etc). 

Thii h d rather minor bia^ that has *i cle^r intuitive interpretation. Suppose thijt executing un^ry 
facts functions requires IcbS use of cognilive resources than executing a binary facts function, and 
that retneving a constant is even easier. The smaliesl anty b\db Ihen mean^ that students prefer 
function nests v^hich reduce their cognitive lUdJ during execution. Sn ihis hypotlicMs is rather 
plausible in addition to having a degree of empirical support 



257 



254 



Chapter 21 
Conclusions 



Dann> Dobrow once said that A I researchers lend to stand on ihe u>cs of ihoir predecessors 
rather than on their shoulders (l)obro\^, 1973), I have tried to suind on a fev^ shoulders. In 
particular, tlie argumentative methodolog> comes from Chujnbk>an linguistics, Ihe induction 
tcchnolog> comes from Winston and his man> successors. I^he represenutiunal framework is 
hej\ily influenced b> Ne\^ell and Simon's vvork on production systems. Hie basic notion of local 
problem sohmg orginated with John Seely Brown, Just as I ha\e built on the work of others, I 
would like to think tliat this theory provides something worth building on. *ll\\s ch^ipter gives a 
somewhat personal assessment of the strengths and weaknesses of the theory, and suggests some 
directions for future research, 

21,1 Strengths and weaknesses m the theory 

The ^hitectural level is solid. One of its basic notions is that pr^c^dures are interpreted with 
the aid of a local problem soher, ITic existence cf local problem solving is indisputable, A great 
deal of bug data and especially bug migration data supports tlie existence of the local problem 
solver, Exactlv how it works, i,e„ the particular set of repairs and impasses, is not perfectly 
understood, ^ITiere is no performance model for the local problem soher, Nuiiuheless, the basic 
notions of local problem solving seem likely to survive a more detailed investigation, 

*rhe other fundamental idea of the architecture level is felicity conditions. They are an 
analysis of what makes lessons simpler to learn from than random examples. As far as 1 know, this 
theory is the first to address the question of why lessons help the learner learn. Having asked tlie 
question, the answers are fairly obvious. Only the showwork principle was a surprise. It was a 
Surprise, I suppose, Iccause few AI researchers have looked at the problem of inducing function 
compositions, Althougu inducing disjunction is a well-known problem, inducing nests of functions 
has received little attention. Having uncovered the problem, tlie felicity condition that solves it is 
quite apparent. Are there undiscovered felicity conditions? Since Sierra is, in fuct, able to induce 
procedures, I doubt that there are major undiscovered induction problems, *rhe di^unaion 
problerr ^nd the invisible objects problems seem to be the only "insoluble*' induction problems. 
Although the architecture level's solutions to these problems, which rely heavily on the lesson 
boundaries, might actually turn out to be slightly inaccurate, the fundamental induction problems 
can be expected to be permanent features of the theory. 

The representational level is somewhat problematical, Gi',cn the standard notion of control 
flow, data flow and interface, it seems a reasonable analysis of the structure of human procedures. 
However, recent work by Brian Sinith (1982) indicates that these fixfures of computer science may 
not be the only way to understand computation. It is possible that a much better version of the 
representation level will be discovered when the new ideas of computation become better 
understood. Ideas about computation serve as a set of distinctions for analy/mg human procedural 
knowledge. Using familiar ideas'about computation, one standard di:>tiiiction was found to be 
irrelevant. The applicative hypothesis essentially erases the distinction between control flow and 
data flow. It says ihat the two kinds of information move together. Perhaps there are other 
distinctions that, while not familiar ones now, may add to the clarity of the analysis of procedural 
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knowledge. In p^^rticulan ihe issue of v^liclhcr fi^tos of attention is intcnsioiKjl or exlensiunal 
(wliicli lb diy:ubbcJ in appcndu 8) seems tu be plj^c \^licrc critical distinttiuns arc needed and 
lacking* 

As P'ln of a thcor> of learning and Iiol problem M)]ving, the represenlation language 
functions mercl> as <i set of absolute constiainLs (as opposed tu binjr> or telativc tuuslriJints. such as 
the learner s biases for maximal generjht>), I bc lepjesentalion defines a fonnat for subproceduies. 
and the learner juM fills in the bijnks, llic representation defines a runtime suae, and the l(Kal 
problem suKer just manipulates it in simple \^avs. If put aside all claims that the langtiagc 
represents the structure of mcntalesc. then the vcracU> of the language hcs in \^hcther or not it 
correi-t1> drav*s tlie boundaries boi\^een learnable and unlearnablc subprocedurc^, and between 
occurnng and nun occurring local problem soKing, In buth learning .md local problem solving, the 
assessment of boundaries; is complicated b> the fact that there are inore constrainLs on the model 
than just the representational ones. In Icammgi the bias constraints filler out most of the 
subprucedures that the representation would allow the learner to output. In local problem solving, 
only stipulated repairs and impasses are used. e\cr> possible change to the runtime sute is not a 
wncttoned repair, DcspUc these complications, the boundaries seem quite well set by the 
representation language. 

The only place where there is some uncertainty concerns loops. As the lesson traversal of 
scciiOn 2,8 shows, a lai^gc number of unobserved bugs arc generated in the course of learning the 
mam coUjmn loop. Most of these could be a\oidcd if (1) the representation had some special 
*>ureach*' loop construction for processing written \hts, such as the lis; of subtraction cohimnsi and 
(2) the learner was biased to choose it instead of the tail rccursi\e fonnulalion of loops whenever 
both were possible. Unfortunately, the data concerning bugs in the early lessons* of subtraction is 
quite sparse (those lessons occur in second grade, the youngest students in the subtraction studies 
were in third grade), and the later stages of subtraction do not apparently pfo\idc an opportunity to 
use this *'1 orcach loop" construction* Ilie existence of tlie Forcach loop construction seems certain, 
but the details of its fonnulaiton have been left undecided until more data is available. Modulo this 
issue, the representation Icvci seems quiic solid* 

The bias level is where the greatest uncertainty lurks. Only the system of acquirin^g and 
matching fetch patterns is well supported. The grammar provides most of ihc ctjnstrainLs here; the 
fetch bugs vouch that it provides the right ones. However, the biases for inducing skeletal 
subprocedurcs and test patterns arc not very well supported* Teleologicil rationalizations provide a 
viable alternative that may be able to generate many of the bugs that the topological approach docs 
not generate* However, merely providing a subject paranicter filled by a Set of step schemata is an 
intolerable retreat from explanatory adequacy, A way to reduce lailorability to acceptable levels 
might be to provide a constricting representation language for step schemata. Just as grammars 
embed the universal aspects of students' noutional knowledge* there might be a grammatical way to 
embody untversal teleological notions* Such universal notions would have to be somev^hat domain 
specific m order to have enough strength to explain the existence of the various observed 
ideological rationalizations, I expect that notions of compensation and symmetry would be 
important for written calculations but that cause and effect might be unimportanL 

To summarise' The architectural level seems quiic solid. The representation level serves well 
as a source constraint on learning and local problem solvingi but its deeper significance, especially 
Its relationship to mentalese* is not yet clear. The bias level seems incomplete. Adding a mini* 
theory of a teleological rationalizations may improve iL 
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2L2 IMrcctions for future research 

I here tHV nitin> Uircctiun^ Tor fmtJicr rcsCtirch. Some were just mentioned. Som'' others Ehat 
were mentioned in earlier ehapters are: 

L 'Hie tlieory should l)e applied to other i,i*iks. such as algebra equation solving. 

2. Optimi/atioii lessons sliould be analyzed and incorpor^ited in the theory. 

3- Cnlicsand their acquisition are tmpoitant issues thai need immediate attention. 

4, Grammar acquisition is an issue thaif;eems ripe for studying. 

5. 'ITic robtionship betv^een slips and deletion needs investigation. It is likely that a rule 
det:tion isjusi a well-practiced slip. 

In terms of .ipp^ing tl\c theorj' to education, perhaps the most important and difFieull area of 
fiiiure rebe^rth ^.uiiLeins the long-term retention of procedures. Peopie remember much more than 
core procedures. Smce some repair-generated bugs are stable, people must be remembering 
patches, the as.MK:iation of a repair with an impasse. One aspect of retention is of key uiterest to 
educators: how much drill is needed? Should it be lumped or distributed? Currently, textbooks 
use about 50 hours of distributed drill to teach subtraction. Most of that time is spent on review 
lessons. Wh>? Neves' analysis of algebra review lessons (Neves. 1981; see section 4.3) seems to say 
that students dtjn't remenibei a complete procedure between lessons, but instead remember a set of 
induLiiun heutistics that enable ihem to ro-onsinict the procedure given the brief examples of a 
review lesson. Ihey don't remember the procedure per sc, they remember how to fcarn it 

What little data there is on long-term bug stability paints a conftising picture. One mystery is 
reversion: the student reverts to using a bug that appeared lo have been remediated. 

The practical importance of studying procedure mcmor> is that it would make this theory a 
valuable tool for curriculum design. Currently, the model can be used to establish a minimal 
content for lessons. It determines whether or not a lesson is missing certain kinds of examples and 
exercises. Given some ciimcular objective for the lesson, it establishes a "requisite variety" for a 
lesson's example* Although it establishes a lower bound on content, tcxtbDok publishers need to 
know an upper bound and an average, as well. They need to know how many lessons to budget for 
a certain objective. Since the present theory cannot tell them this, it is, at best, half a tool. But it is 
stilt better than no tool at all. 

Not only does the theory critique individual lessons, suggesting examples that should be 
added, it can critique the lesson sequence as a whole Some textbooks leave out critical lessons (one 
leaves oiit the borrow from-^ero lesson!) while retaining ones that are unnecessary (according to the 
present theory, which doesn't address memory-related issues). 

It could be argued that these applications of the theory will be short lived* As cafculators 
becumc ubiqiatous, there ma> be less need to teach students efficient arithmetic skills. Instead, the 
procedure might be used to introduce concepts more useful m today's society — concepts such as 
procedures. dut.i structures (c.g.. ba5C 10 notation), design (teleological semantics) and debugging. 
*'Rote*' learning of the surface stmcturc of arithmetic procedures may disappear. Nonetheless, there 
will always be many procedures that adults learn by rote, and someone has to design the lessons 
that teach those procedures. Almost e\ery piece of computer software sold today is accompanied by 
a manual whicli teaches the use; how to use it. Whenever a company rnarkets a new nnachine, its 
service personal must be trained to repair iL Whenever a bank creates a new kind of account, its 
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pcrbtJiinc! musl be Uught new proccdtircs Tor opening, majntjining and dosing the jccount. In 
musl t.*!scs, the people learning Uiebc procedures arc nut iiucrcsicd in the deep, telcologicjl stmcture 
that underlies Uie procedures' design; the> just want to know what steps to follow, ''Rote*' 
learning is all Uiey want ami perhjps it is all iliai ihey need in order *m wuik efficiently. If this 
tlicur> ':*in be generalized tu usks uuLsidc the domain of written s>mht)l m.inipulation, it ma> be a 
tool (<n tl;e tramuig niUusti) to use' in rapidt> designing curricuU tu meet ihc needs of its clients 
* and tl^cir students. 
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Appendix 1 
A Bug Glossary 



o-\=:o/AriJr{/iiOuRO\v 

WtKti ai-oJuniii has a I ihai chan^jcj loa Ob> a previous borfovt- the sWdcni *raciOas ihc «tiHsu^:[ Ui ihji column 
= 508) 

IhmksO-N b 0 except when the coluniLi has been borrowed fro nj (906-4^ = 502) 
0-N = S/AlTIiR/llOKROW 

When ai;o)unin ha^a 1 lhai v^as changed to aO b> a prcvjous borrov^. the <;iudcnl vvtiXc^ the buioi"^ d^gtt as the ans^i^er lo 
thai column (512- 136 = 436) 

0- N^SVKXCrriVAJTnR/IlORROW 

Thinks O-N N e^icep* «iien the column has been borrowed from (906 -464 = 5S2) 

M=0/A!Tl:R/liORROW 

Ifacolunm siaju vtiih \ m both lop and bcitom and js borrowed ffom. ihe student wntesO a:»thc answer io^^^y colun^n 
<8i2-5i8 = 304) 

1- Ul/AI TIUMiOKROV 

If a column stiirih wah 1 m buih top and boiu^mand jsbcrrov^ed frcm. the Mudcnl wnies 1 as the answer to that column 
<812- 518 = 314) 

ADD/BORROW/CARKY/SLB 

The sludenudds Instead of subtractf^ but he subtracts the earned digit instead ofadding it 
<54 - 38 71 72) 

ADD/nORROW/DHCRGMRNT 

Instead ofdccremcnting the student adds !. carrying to ine next column if necessary 
8 6 3 8 9 3 

-13 4 -10 4 

7"4~5 8 0 9 

ADD/BOUROW/Dl-CRRMl NT/WnUOUT/CARRV 

Jmtedd iff dctfcmuUing the student adds I If this ^ddiboa icsults jti 10 the stijdcnl does not cair^ bui sjmpl> tvritcs both 
digiLs in the same space 

8 6 3 8 9 3 

'13 4 -10 4 
7 4 9 7 10 9 

ADD/INSTLADOF/SUB 

Ihe student adds ifiitcad of subtnjcts. <32- 15 = 47) 

AD[J/I KyDLCRIMLNT/ANSwrR/CARRY/rO/RIOrn Addscolamns from IcA to right instead of Mibiracls Iji^fure wnting 
thccolvniH sans^cr. n ts dccrcmcjacd and truncated to the untis djgtL A one is added into the ne^t column to the nght 
(41! - 215 = 527) 

ADIVNOCARKV/INSTliAlW/SUG 

The sttident adds instead of subtracting Ifcary^rfTg is required, he <jocs not add the earned dJgjL <47- 25 - 63) 

Al WAYS/BORROW : 

Ihe student borrows in every coiumn regardlc^ of whethcfil IS nec(iS6ar>. {488-229 = 1159) 

ALWAYS/nORRO^V/r.IT'T 

Ihc student borrows frvni the leftmost digjtjnstcacT of borrowing from the digJt immcdjatcl> to the left <733 216 - 427) 

Bl ANK/INS! [ A(X)SVI10RR0W 

^\'hen a borrov^ i& needed the student simply the skip^ the column and goes on to the nexL 
<4i5-2S3 = 22j 
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UhMi fii^M^fii^ijiiJii: .iM^luiin i\i ^hicli th^ klip l^ ^nuJcr lh.if) Uk boLtoi^, ihc sluJcfii jdds ^Oialtic lop digit, decrement 
Kifkiiuii b^ijif iK^iii*rtcd iiiU^.^ud hi^uuws ffOiii Jk ncii column lo ihc left Alsolhc Mudcn ^tipsanj column wttjcl 
,* t* o\<\ a (J or a blanj; i» the borrdw.nii procciA 
1 S 3* 5 13 

i 5 4 

!u)Kiui\v/\( uoss/n \<o 

vv Li I nn rii\A iuc a< ui- -> ft ^^c udcxii sJt ips o\ the 0 lo bomm ffoni ihc nv. column I f (his causes him lo h^u c lo 
h4Ni rt^vi ICC ho decfcnjonis the \aine itumljcr b<)ih times 
9 0 4 9 0 4 



mu ROW/ \r K(>ss//rRO/ovrR/B[ ank 

\^hcn b^^^:^^vt■'li; acrossa Oo\cf a blank, the student skips 10 the nett column (402-6 - 306) 

(iORROW/ \C'KOSS//l:RO/OVi U/ZI-RO 

ln^lcad bo^rl^>-lnL;acro^^aOlhal isomer a 0. ihc&tudtni docs not change rhtObul decrements the nexl column lolhe 
teri m^wad <it0^'304 = 30S) 

tK)kl^0W/ADn/I>ICklAil\l/l\STI!AD0lV/J;.RO 

lijsKddL^ftjmyviifji; across a 0. the siudcnichan^cs (ho 0 if> 1 and doesn't dccremcnl an) coitfmti mihekfL (307 - 
108 =^ ^19) 

iJORfajw/vDO/is/riiN 

llie student changes ihe number ihai causes the borrow inlo 10 instead of adding lOlo <83 ■ 29 = 51) 

IJORROW/DICKI MI \ jj\G/rO/BY/EXTRAS 

ihcre is j burro* acrovsOs. the ^Mcnl docs not add 10 to ihe column he js doing but instead adds 10 minus the 
nun^ber of O's borrov^ed across 

3 0 S 3 0 0 8 

■ 1 3 9 '13 5 9 

~m 16 4 1 

IK>RKOW/I)MT/0'\=:N'&SMAIX-URGE=0 

tlic student docso t borrow 1 or columns of ihc formO- N. he wnies N' as the answer Otherwise he writes 0. <304 ' 179 
= 270) 

lJ0RR0\V/l)O\ l/DICR:'MI'NT/rOP/SMAM£R 

Ihe student \m1I not decrement a column if the tap number ts smaller than the bottom number 
7 3 2 7 3 2 

- 4 S 4 '434 

i 5 a 1 9 6 

WrOng Correct 

lJORR0W/JX)N'M/IM:CRrMFNT/UN'LlSS/lJOTTOM/SMALLER 

rnt^studenl will no; decrement a columr. unless the botlom number is smallenhan thetop number 
7 3 2 7 3 2 

- 4 8 4 '434 
i 5 fi 3 0 it 

IJORROW/I ROM/AMVZHRO 

InsK^td of borrowing across Os, the student changes all the 0 ito9"sbut dees not cornmuc borrowng from the column 10 
the left (3006-1807 = 2199) 

IWHHOW/l HOM/jK>n'OM 

[he student borrows from the bottom row insicad of the top one 
8 7 8 2 7 

' ; 8 '208 

IJORRO\V/j'ROU/HOnOVI/jNSTr:ADOjV/.HRO 

\^hcn bt^rrowmg from a column of the formO- K the student decremenLs the bottom number instead of theO 
6 0 8 10 8 

' 2 4 9 * 49 
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im-294 = 598) 

IU:)J^[<0W/I ItOM/OMVlS/NIM* 

Whcji l>^Jrr^>l^»l^i! fR>i»3 MhcsiuJcui ircah the ! as jfji were 10. dceremcnting )t to^ 9 
(316- 139 ^ ?67) 

WhKii tK ll^^v^ilTV U^m ^ 1 the shi;!^;^ ch^DgCMhe I vo 10 uislcail'of toO (414- 277 = 237) 

fu)kfa)w/i RO\f//i ko 

III I, jiJ I'l I - him^.iijl Jtlu%^a 0 Uk Mudcni chjpiius ihc 0 io9bui doc» n<H conijfiuc borrowing from theeoltimn to Ihc left 
- i'i b 3 0 0 6 1 0 3 

\±_t 7 - i S 0 7 4 5 

TT"^ i 2 ^ ^ 1 5 8 

FIORROW/I RO\f//[ kO&I riT/OK 

l^<^tcjdnr^t>iii k^int^*-^^'^' jO. ih^ iUjvlcnichdntioihcOio9but docs no* coni*nuc bonowing from the column ihc left. 
Mowe^cnrihc di^JUoihe left of ihe 0 isomer a blank then the student doc^theeorrca thing 
306 3006 103 203 

' 1^87 - 1 S Q 7 4 5 4 5 

—j-r? 1 M ^ — r-g 1 5 s 

Wrong Wrong Correct Correci 

iKMtiu)vv/nmvi//Mto/is/ii-\ ^ 

\\hi^i, t>ijrrtk^Lii^ across 0 iht sUiJeni ch^iriiios theO to 10 jndducj>not decrement an> digit lothc left (604 - 235 = 479) 

liOlUtOW/KAORi://I^RO/OVI'.R/BlANK 

When borroukin^ icro^s a Oovcra blank, the ^tudcnl treats the column wjih the moasjfjt weren^t there 
5 0 5 5 0 8 

' 7 : 7 

'~ g 5 0 1 

Wrong Corrca 

nORROW/IVlO/ONI^^ I TN 

When i borrow caused by a I* the student changes the I to a !0 msiead of adding 10 to it. 
(71- 38 = 32) 

R0RR0\\7\0/Di:CRh:MENT 

When borrCTMng the student add.*; 10 coricctI> but doesnt ehangc any column to the left. 
(62 -44 = 2B} 

HOkR0W/\0/DICRr:Mh:NT/FJ(CnFr/IAST 

F>ecrcnienLS only m the last column of the problem (6262-4444 1828) 

[l0RR0W/0NCIVTIII'N7S\1ALLrR/l'R0M/E.ARGl-R 

Ihe .:udent \Mlt borrow onlj once per exercise Trom then on he ,subtraci!> the smaller from the larger digit in each cotumn 
fcgardlesiof ittcirpobmons (7127 -2389 = 5278) 

BORROW/ONCr:/Wl niOUT/RI-CURSE 

llic Mudcnt will borrow only once per problem Afierth.il* jf another borrow is required the student adds the lOcorrealy 
bjt does rot dccrcntcnL If there is a borrow across a 0, the stud(!nl changes the 0 to 9 but doci not decrement the digiE io 
the left of the 0 

5 3 5 4 0 8 

- 2 7 S -239 
3 5 1 i 6 5 

iiORRow/oM v/i ROM/I 0[>/^^^^AL^^:R 

When h(krruain^ ihc ^ludcnl tries to find a column in whit-h the top number js smaller than the bottom If there is one* he 
dccrenlen^^ Uui otherwise he borrows correctly 
(9283-3566 = 5627) 

r?ORR0W/0N[ Y/ONCr 

When there are several adjacent borrows, the student decrements only with the first borrow, 
(515*278^ 357) 

DORRO\\7SKIP/I-0tAL 

Wh^r* decremuntiTiii iIk .student ^^jpso^er columns m which ihc lop difiU and ihc bottom djgji are the same (923 -^27 = 
40<>) 
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fu>kKtnv/if \m ls/m \]/{y\Q\\/\s\0//h\iO 

bi/m*\^ i^^JustJ bvoO. Uk Mutleni dixxMioUd J 10«)rrccil> Whai ht docs instead is add 10 plus ihe digil itiihe 
ncxicokJinnUMbclcft (50-38 = 17) 

m)kiu)\\ / 1 Kf M /o\\ / \s//r'RO 

Wiivu bi>in>v.iii\- I, UiL >tudcnl iraLsihc 1 ,i^ifii v^iCk 0. ihai is. hcchan^cs the 1 lo9af)d decrements ihc number la 
ihL^Lttonhc Ttin !5<? ^ 144J 

fM>KUO\V/l Ml/Din 

UiL ^lyav(" ln^iiowvUji d*ff<UiKL' tiL'ivtvcn iJn^ Uruii.iiid the iHHtoni di^iii of ihc cufrcm coluum In other \^ords, he 
b<?rro\^sjiMejiiJUi;h tooo the ^tiijiraciioh. wtiich ihcn al\^a>> rcsuliMn 0 (86-29 = 30) 

iJOIUtOW/WOM/RICURSh 

ln\tcad <'l ^*<""i^<>^'ng dcrovsa 0. die siudenl slops doing the exercise (80J5 ' 2662 = 3) 

fiOkkOwl \y\ rO\udo\ i/houkow 

\^hLn ihvic jrc iwrtj ho;iij\^s in d to* the student docs die rirs>t borrow corrccil) but wJdi die second borrow he docs not 
deCr*.nvnUhc cloes,^dd 10 correctly) ( 143- B8 ^ 155) 

CAS"ivsLurRAcr 

lliesmdemskip^ihecntireproblcm (8- 3 = ) 

COPV/I 0M\/USI/C0(LVfVn7R0RR0WFD/rR0M 

After borfowini* froiadic h<,\ column, ihc student copies top diiiU as ihcanswcr (80 -34 = 76> 

1)1 c^Ki Mi AK\f I /ON/MLi \ m r/zr-Ro 

\Vheii b^>rJtj*mgaLruss ciOand thcborrovi is caused bi 0, the studeni changes ihc right 0 to 9 instead of lO. (600- Ul 
^457) 

1)1 CK! \u \Tm/o\vm ls/zokos 

Whtn iiiLFt i^a burfott dcros,^ /cfo. decrements the number to die left of die 7ero(s)by an extra one for every zero 
borrowed across (4005 - 6 = 1999) 

n)-CRrMIM/hY/lwO/OVTR/T\VO 

When borrowing froma column of die form N ■ 2, die student decrements the N by 2 instead of 1 (83-29 =44) 

DrCREiMr'NI/LlrriM0ST/7j?R0/0NLY 

Whui b^>rrtJwtA£ across iwoor mure 0> the student changes the lefimasiof die fOv^ ofO's to 9 but changes die odierO'sto 
10s Me will gneansi^ershke (1003* 958 = 1055) 

Dl'CRr.VirM/MfJ Tin IV/I-ROS/nY/N^^MUnK/TO/LKFr 

VVhciiburfo\^in|;,itrovsO thv Mudent^-hanfees die leftmost to a 9, changes the next 0 to 8, etc. (8002- 1714 = 6278) 

nr■CR^\^l^^;T/MUl JIPI I-//h'ROS/ilY/NUMBHR/rO/RIGIlT 

Whuj5urf<>wmfejLri>vsOs. die student changes die nghtmost0toa9,changesdie next 0 to 8, etc (8O02- 1714 - 6188) 

nUlUMLVl/0\/MkSr/BORROW 

n^e first column thai requires a borrow is decremented before the column subtract ^ done 
(832-265 = 566) 

1)1 C'RI-MrNr/0Nr/!O/l'M:VBN 

Instead ofdecremenlmg a I the student changes the 1 toan II (314 - 6 = 2118) 

DI'CRI MIM /rOP/l I Q/IS/L:IGIIT 

When borro^ijit! froiti 0 <>t 1, changes die 0 or 1 loS, doesnoideaemcnt djgjlto die left of the 0 or 1 (4013-995 = 377S) 

DIMVO^N^O 

tht ^ltidLLL^ t. ncuunur> a calumn of die form 0 - N, be docsn t borrow, instead he wniesO a<i die column answer 
(ID -21 ^ 20) 

niFl/0-N=N 

whtfi ^ht vtudtnt i;utv>unttr^a column of die form 0 - N, he doesn't borrow Instead he writes N' a* die answer (80 ■ 27 = 
67) 

nif I/OA - \/Wlll N/fJOItKOW/l'kOM/XFRO 

VVh<.u i'>^Mrk: ,i n liiiJ [Ui^ boNow IS Caused b> a 0, the student doesn't borrow Instead he writes die botiom 
nun^her .rs thi o>!orr]ii an^^tr He wjll borrow corrccll> in the next column orm Other circumstances. 
I 0 0 4 0 0 

3 ' 2 4 ft 
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[)IJ JVI N - I 

When aoJunnEuis thcfomi 1 - N' (ho sludcni writes 1 as thocoJumnatisA^or (51 - 27 ^ 31) 

1)11 l/N 0^0 

Ihc s(udcjil ihmks ihai ^ - 0 js 0 (57 - 20 ^ 30) 

[)\] l/N S^N 

Whtncur ih^rc i>acolumri lhai h.isilic ^mt number on iho lop and Uiu boUom. iho <viti<Jcni ^Jilosihal n'-mbcr as the 
*ius\^er (83 - 13 - 73j 

m] SN i/ItOKUOW 

Ihcsiudcflt ^W*^dom|!ihe cxcrcisc vthcn a borrow »s required (833- 262 = I) 

ix)\ j/i)ra<i \ij \j/sicoMV/i-RO 

\Vh<^n batrovuii&acros-^a Oand (he borrow j^i caused b\ aO. the student changes ihe Ohe is borrowrmg across mloa IC 
instead ofa9 (700-258 =452) 

IX)\T/l)ia<l\EIM7/rR0 

WbcnborrowmgaaossaO. ihesiudeni changes theOlo lOmslead of9 <50t^O13 = i^) 

D0\n7D!CKrMIM7/^r;E<0/0Vi:R/nUNK 

'Ihcsluderit ^iW noi borrow across a zero ihat is over a blank <305- 9 = 306) 

i)0\'T/ni*rR \:\\ v \ \ /7rRO/ovi:R/7,nRO 

Ihe Mudeni w ill noi borrow across a zero ihat is over a zero. (305 - 107 = 308) 

(}0\ 1/1)1 CRIMI \r//i;RO/LMTL/IK>ITOM/nLANK 

When borrfjwm^ acro^a 0, iht student changes the 0 to a 10 instead of a 9 unless the O»so\era btank, m wEnshcasche 
docs the correct ihmg 

5 0 6 3 0 4 
- 3 1 8 : 9 

Wrong Correa 

DOS I/WRITR/ZHRO 

0)esn"i ^nXL zeros m the answer (24- 14 = 1) 

!X>IBI I /DirRI'Ml NT/ONES 

When borrov^ing from a L the sludcni ireaLsthe 1 as a 0 (charigcs the 1 to 9 and continues borrowing to the left (813 - 515 

= m) 

f ORCE T/nORROW/OVI R/nLANKS 

Ehc^iudvnt doesn't decrement a numberihat is over a blank (347-9 = 348) 

IGN0RI7( rnMOS'r/ONH/OV[:R/BLANK 

When the left column of the exercise has a I thaus over a blank, the student ignores that column <143 -22 = *') 

1G^0RI'7ZI■R0/0VFR/(JLANK 

Whencverthcrciscolutnn that hasaO Over a blank* the student Ignores that column. (907-5 = 92) 

lNCKrMI-M70VI-R/IARGER 

Wh^-n borrowing from a column jn \^hich the top iS smaller than the bottoin. the student increments mstead of 
dcaemenling (&33 -277 « 576) 

INCRFMrvr/^l RO/OVIiR/BLANK 

When borrowing acroi^s a Oovera blank, the student (ncrements the 0 instead of decrementing- 
(402- 6 = 416) 

V<?.TN'l/.\ITf k/liORROW 

Ifi column js jf the form N 9 and has been borrowed from, when ihcstudent does that column he subtracts 1 instead of 
siJbtrartm£ ^ {^-7%' 127) 

N-N/Arn i</m)RROw/CAUsr:s/iJOHROw 

Ikjrrows w^iih columns of the form N-N if the column has been bor^o^^'ed from (953-147 = 7106) 

Nv\7CAlSLS/n0RR0W 

fiorrows with columns of the form N-N.<953- 152 = 7101) 
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N i/\ni K/TOiutow 

If a column bjd Ihc form N N and borrowed from, ihc siu<!cnt ionics I Uic answer io ih.it column <944 - 3^8 ~ 
616) 

NA 9m LS/J)I PRE Mh:NT 

rVhcii ^ k^olu^iiii hj> ihc samL nunibi.r on the top and liie bouom. the ^jdcni wniCh the ausv^crand decrements (he 
next column lo the icft e^cn thoujih borroMtjngi^ not iicec^.vir> 

-. 59J 

ONC [ /EJOUUOW/M WAYS/liOKUOW 

Once a Mudvai has borrt ;wcd, he conimue^ to borrow jn c\Ci> rcmammg column jn tiie cxcfCksc - 229 = 11 59) 

QU r/Wi !l N7B0 n OM/m J\N K 

\Vln.fi ih^r bidtONi number ha\ fcw-cr d\^}{s than the tf^r numb\,r the ,.iudent qu*tA.as sounds the bouoin number run^ out 
<4)9 - 4 * 5j 

SIMPLIVPROfif JAE/SrC^lTJ-R/SUnrRACr 

When Ih^ t^u^nt nunib^r i^d ^n^ile djgtt and the lop number ha Aoor nioredtiti^. the ^»iudent repcaied)> i^ubtrdCl^ the 
Mnglc bottoni dip[ from each digit in the top number. 
<34S^2 = 126) 

SMAl hi R/rROM/LAKGER 

"1^^ student docsntbOi'rou. ineach coEumn he sub:racts the smaller dj£U froin the larger one. 
(81 -3S = 57) 

SMAi I I'K/i rom/largi R/iNSii^An/or/nORROW/rROM/;:HRO 

l^e ^udont docs not borrow across 0 Insicad he will ^ubtraa the smaller Trom the larger dtgjL 
3 0 6 3 0 6 

^ S -14 8 

Tin 

SMAI I rR/l'ROM/LAROF'R/WMLN/nORROwHD/FROM 

VVht.n ihcrc ajc uo borrows m a row. tiie sjodent doe^ the Hrst one correctl> but for ihc second one he does not borrow, 
instead be subir^'ids the smaller from the larijer digit, regardless of order (S' 4 - 157 = 74?) 

SMALLrR/I-ROM/LAROlZR'^nil/r.ORROW 

WVn b^sfo^jngthe student decrements cofre^tlj. then subiracls the smaller djgit from the larger as ifhe had not 
borrowedatall (73 -24 = 411) 

STOPS/nORROW/Vr/mv;! TIPLn/?.ERO 

Itstcad of borro^^mg across several O s. the siudeni adds 10 to the column he s doing but dccsn t change an> colunfin to the 
left (4004 -9 = 4005^ 

SIOI'S/IK)RROW/AiVSKCOND/?.KRO 

WfienborrowmgacrofissevcralO's. changes the nghtO to 9 btit not the other O's (4004 -9 = 4095) 

SI OPS/nORROW/A f/ZlfRO 

ln>{u6 ^i"boJx^y*jng across a 0, the student adds 10 to the column he's dom£ but doesn't ^rcment from a column totht 
left (404 - 187 = 227) 

SILTITR/SUBTRACi 

Whtn ihtrc are blanks m the bottom number, the student subtracts the lefbnos! djgit of the bottom number tn eve r> 
coiumn thai has a blank (4369 - 22 = 2147) 

SUB; JOtTOM/FROM/TOP 

rh*, Uudvntalwa>ssuU-acti the top diyil from the bottom digtt. IfthebottomaigUissmaller. he Jccremcnts the top digit 
ar^d adds 10 iu tho bottom before subtracting Ifthe bottorr d*git is zero, howcTrtr, he wntes the top djgn m the ^ny/tCL It 
the topdiga is J greater than the bottom hewntes9 lie wjII give answers hke th'E (4723- 306? -: 974?^ 

SLlJ/COPY/LI-A!iT/IJOrrOM/ MOST/TOP 

!,t^idcnt doet> not ^tjbtract Instead ^c topjcs digtts from the exercise to Oil in l^e answer space tic copies ,\c lelbnost 
dit*i from the top number and th<; olhci JigiL from the bottom nombct I le will give answer>> jfce thjs, <648 - 231 - 631) 

SURA)N(''OVFR/BLANKS 

When there are bta^iks bottom number, the student Sbbtracts 1 from the top digiL 
(548 - 2 e 436) 

\ KLAT/'rOP//J:RO/AS/NINB 

InaO-Ncolttmn. the student rioesnt borrow. Instead h^ircais the OasifU were a 9. (30-4 = 39) 
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nUAl/lOiV/!'RO/AS/li:N 

In a 0' N column ihc student adds 10 lo a corrccUi bm docsn'l change any column lo the left. (40-27 = 2J) 

XA-'0M!'irR/!sOKROW 

If a column has been borrowed from thcsludcm wntcs zero visits answer (234- IIS = 109) 

\A - \/ArH K/HORROW 

H a colunm hjs been borrowed from, ihesJudcnl vtnics ihc bouom digil as jLsanswcr (234 ' J65 = 169) 

/E UO/\i N l^/liORKOW 

Whcnacolumnre^iuifcsaborrow, the student decrements corrcclfj but wrUcsO as the answer ^ 
(()5'48 ^ 10) 

/I K0/!\:>EE An/0!/HOimO\V/! KOM//rRO 

Ihc >iudLni tAuji I huJTyw, if hi, ha-s loborrov^ acrof^sO Instead he will wnic Oasihesnswer toihe columji requiring the 
borro\^ 

7 0 2 7 0 2 

: 8 '348 

7 0 0 J 6 6 

ZrR0/I\^TI=AE)OI /BORROW 

The st 'dem doesn't bofrow: hewntesOaslheanswerinsicad (42-16 = 30) 
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ilic djjgnoscb of all ic&ts of all students analyzed by Debuggy fall into the following 
Gitcgorics: 



ITie diagnobcs of Uic students that were anaty/ed as having bifgs are shown, ordered by their 
frequency uf uaurronce. [)jagt*oscs consisthig of more than one bug are shown in parentheses, 
i hcre are 134 distinct diagnoses, of which only 35 occurred more than once. However, these 35 
diagnoses account for 276 of the 375 cases (74%). 

1"hc diagnoses in the appendices sometimes contain coercions. A coercion is a modifier that is 
included in a diagnosis to improve the fit of the bugs to the student's errorsn Most often* these 
slightly perturb the definitions of bugs. For example, certain bugs modify the procedure so that on 
ix:ca<ion it \^ill v^rite column ansvters that are greater than 9. Some students who have these bugs 
appaientb kno^A from addition that there should only be one ar.swer digit per column, so they only 
\^rite the units digit To capture this* the coercion !Write-Units-Digit-Only is added to the 
diagnoses of such students by Dcbuggy. Coercions can easily be picked out because their names 
have exclamation points as prefixes. For more on coercions, see {Burton, 1981). 

103 occurrences 
(Smaller- From -larger) 

34 occurrence 

(Stops- Borrow -A t-Zcro) 

13 occurrences 

( Borrow- A cross-Zero) 

* 

10 occurrences 
t (Borrow-From-Zeio) 

(Bonow-No'Dccrement) 

7 occurrences 

(Stops-Borrow-At-Zero Diff O-N-N) 

6 occurrences 
( A 1 ways- Borrow- Left) 
(Borrow-AerosS-Zero !Tonched-0-Is-Ten) 
(Borrow- Across- Zero Diff-O-N = N) 

(Borrow-Across-Z>ero-Over-Zero Borrow-Across- Zero-Over- Blank) 



Noerroi^ 112 

1-Trurs due to slips alone 239 

Frrorsdue to bugs (and slips) 375 

Unanaty/able 421 

total 1147 



(10%) 
(20%) 
(33%) 
(37%) 
(100%) 
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(StopsMiorrowAt-Zcro Borrov^ Oiicc-riicn-Smallcr-lTom Largcr l)iff*0*N = N) 
5 occurrences 

( I )orrt)w* No* Decrement Diff-O-N 
4 occurrence s 

(0-S - S-l'.xccpi-Afrcr-l)orrow) 
ftiurrow-Across-Zero Diff-O-N-O) 
{|)itr*0*N = N Zcro-lnStc*idoMIorrow) 
{BorroA'No-Occrcmenl-Hxccpfl^sO 
{non'l*Do;rcment*/ero*Ovcr-l)lank; 
(Quii-Wben llouom-Blank Smaiier*From*l^rgcr) 

3 occurrences 

(l)orrow-fnU)-One='ren Slops*IJorrow-At-Zcrc) 
(Oecrcmcnt*All*On-Mulnplc*Zcro) 
(Dccremcnt*MuUiple*/leros*By*Numbcr*To-Righl) 
(Don't* Decrement-Zero) 

(Smaller*[''roin*l,arger Ignore-LePtmosiOnc-Over-BIanlc) 

2 occurrences 

(0-N = 0-Aflcp Borrow) 

(Borrow-Across-Second-Zcro) 

{Borrow- Ac rOss*Top*SninllepDccremci It ing*To) 

{Borrow*I)on'l*l)ecremeni-Top*Sma11er) 

(Borrovi.*Oon't-Occremcnt*Un1css*Boilom*Smallcr) 

(llurrow-Onl>-I-fom Top-Smaller Borro\*-Across-Zcro O\er-Zero Borrow Across- Zcro-0\er-Blank) 
{Sm alter- Fro m-Ljrgcr- 1 nsiead-Of- Borrow- From -Zero Burrow-Oncc*'llien-Smal1er-From-i^rger 
Djff-0-N:=N) 

(Sniallcr*l*rom*l^rger*lnsicadof*Borrow*Unlcss- Bottom-Smaller) 
(Stops* Borrow -A I* Mulliplc*Zero) 

(Stops*Borrow*Al*Zero Smal1er*From*L'irgcrWhcn -Borrowed* From) 
(Slops-Borrow-Al*/.ero OifF-0-N = N Smaller- From* I>argcr*Whcn*Borro\ved -From) 
(Stutter-Subtract) 

I occurrence 

(iOnly*\Vrue-Units*i5igit N*N*Aftcr*Borrow*Causes*Borrow) 

(!Onl>-Wnte'U lilts- Digit Stops B(>rrow-Al*Mu1t!plc*Zcro N-N-Afler-Borrow-Causes-Borrow) 

(!Sub*Units*Spccial Uornow-Across-Zcro Sma1Jer*From*Urger) 

(!Write-l,eft-'len Sm?Jler-From-Urger Diff-O-N-^O) 

(!Wiile*Lcft*ren Borrow* Ac ross*Second*Zero DifF*0*N"N) 

(!Write-LctV Fen Forgel-Borrow-Over-BlSnks Diff-0-N = N) 

(;rouched-0-N^K Born)w-Acfuss-Zcro DifF-N-0 = 0) 

(!Touched*0*N = N Borrow-A cross* Zero Borrow-Once-Thcn*Smalier*From*I^(^er) 
(ITouched 0*N"N Borrow* Across* Zero Borrow-Across-Sccond*Zero 

S maller-FVom- 1 ^rgcp When -Borrowed* From) 
(0-N^N-After*Borrow Borrow-Across- Zero-Over*Zero ^'-rrow-Across-Zero-Over- Blank) 
(0-N = N-APter-Borro\* N-N;=l After-Borrovk Smaller- Fro m- Larger- 1 nslead-Of- Borrow From Zero) 
(0-N==N-Aftcr-Borrcw) 

(0-N = N*Fj(Ccpt*After-Borrow M -0*Aftcj-Borrow) 



Oijsi mi l) liUG Sin's 



275 



(0-N'-N*Kxccpt-Aficr*iiorrovi 1*1 = 1-Aftcrliorrow) 
(M :^0'Aftcr-Iiotrow) 
( \dd-li^.stL\jdof-Sub) 

( \dd-l r-l)ccrciTictU*Answcr-Oirry-ro*Righi) 
(HUmk-lnskMdof-liorrort l)i(T-0-N=:N) 
(Horrort- \cn)ss-Sccund-/crt) l^)nn*Wrilc*/.cro) 
(liorfort-Act<^^s-Scanid"/cn> Iiorrow*Skip*liqual) 
(ii<)^ro^^^\cR)ssyc^) O-N^O-lixccpt-Aflcr-liorrow) 
(liormrt*Atross"/cro 1-1 =r 0-Aftcr-Rorrow) 

(iJnrrurt ■Ak,r<jSs-/cru liurrov^-Oncc- Tlicn Jitnallcrhrom" Larger O N = N-Hxccpi-Aficr-Iiurrow) 
(litjrnjw- \Lti)bs Zero Siib Onc Ovcr lilank 0-N = N-Aftcr-Iiorrovt 0-N= N-HxcL^i-Aftcr-linrrow) 
UJonow* Across- Zero Borrow*Skip*I^quai) 

(liorrow* Across- Zcru Quit-VVhcn-lioitom*BUmk 0-N-0-Aftcr*Borrow) 
(li(>rrov*^-Across-Zcro Kt>rgct*Iiorrow*Ovcr*Ulanks DifT*0*N-N) 
{Borrow -Acfoss-Zcru lgnurc-LcPtjnost*Onc-0\cr*Rlatik Borrow* Skip- Kqual) 
(IJorrow-Across*Zcro ! louchcd-O-N-N) 

(Borjow* Across- Zcro-0\cr- Zero 0-N = N-Hxccpt-Aftcr-Rorrow 1*1 = 0*Aftcr Borrow) 
<Borrow*Across*Zcro*Ovcr*Zcro) 

(Borrow-I)oiVt-l)ccrciiicnt-UnlcsS'Bouom*Smailer X*N^O-Aftcr*Borrow) 
(Borrow*l)on't-l)ccrcmcnt*Unicss*Botlom*SmaIlcr Don'lAVriic-Zcro) 
< Bi^rrow ■ Krom*All -Zero) 

(Bij^ row'Kroin-Boltom* lnsicadof*Zcro Diff*0*N = N) 

{ Borrow -rmm-Onc" Is- Nine Borrow- Krum Zero DilT-0*N = N-Whcn-Borrow-From-Zcro) 
{Borrow-Lroni Onc-I^. Nmc Borrow-Krom-Zcro Don i-Dccrcmcnt*Zcro-Ovcr*Blank) 
(Borrow-Krom*Oric*Is*Tcn Borrow*From*Zcro*Is*Tcn Borrow-Only-Once) 
{Rorrow-From-Zcro 0*N-0*AftcrBorrow) 
{Borrow'hrom*Zcro 0*N=N*Aftcr-Borrow) 
{Rorrow*From-Zcro&Lcft -Tcn-Ok 0-N- N-Aftcr-Borrow) 
( Borrow*From*Zcro&Kcft-Tcn-Ok) 
( Rorro w ■ From*Zcro*ls-Ten) 

(Borrow*Inlo-Onc=:Tcn Decrement- Multiple- Zeros* Ry-Numbcr-To* Left) 
(Borrow-lnto- jne-Ten Dec remcnt-MuliipIe-Zcros-By-Number- To-Right 

Borrow-Across-Zxro-Ovcr-Zero) 
(Rorrow-No Decremeni DjIT-0-N = 0) 
(Rorrow- No-Decrement Smallcr-From-Larger-Excepi-Last 

Smallci-l-"rom-Urgcr-Insleadof-Rorrow-Unless-Boitom-SmaIIer DifF-O-N^N) 
OiorrowNo-Dccrement Sub-One-OverBlank) 
(Borrow- No- Dccrcnicnt-Hxeept- Las rreal-Top-Zero-As-Ten) 

(Burrow-Nij l)ccremenf-Kxcept-I.asi Decrenient-Top-Leq- Is- Eight X-N = N-After-Borrow) 

(Borrow-Only-I-Yom-l"op-SmalIer) 

Hiorrow-Only-From-'f op-Smaller 0-N = N-After-norrow) 

(Borrow-Trcai One-As-Zero N N = l'Aftcr-Borrow Don l-Dccrement-Zero Ovcr-BIank) 
(Borrow- Unit-Dirr^Only-Do-Units) 
(Can t-Subiraci) 

(Dccrcmenl All-On-Mulliplc-Zcro Double-Dccremcnt-One) 

(Dccremen(-Ixftmosl-Zeio-Only) 

(I)ccrcmcni-Muitip!e-Zeros-By-Numbcr-To-fjCft) 

(Dccremeni-T op- Ijjq-Is- Eight) 

(Diff-O-N^O Dirr-N^O-O Stops- Bnrrow-At-Zero) 

(Djff*O N = 0 Diff-N O=0 Doesn*t-Borrow-Fjtcept-Last 



273 



Oiisi:Hviu) Rug Sin's 



Smallcr*KroinMiirgcr-ljistcadof*Ik)rro\^-Unlcss*Bouom*SmaIlcr) 
(1)ifr-0-N=^0 I)ifr-N-0 = 0 Sma!lcr-Prom-Urgcr-Hxccpi4ast 

Sma!lcrKromM^rgcr-ln!^cadof*f)orrow*Unlcss*Rottom-SmalIcr) 
(I)iff-0-N = N) 
(DifT-O-N-N l)iff-N-0=0) 

(Diff-O-N ^^N-Whcn-Bormw-Krom-Zcro Don'l-DccrcmenhZero) 

(i)ifr\-0-0 Micr-Krom-Largcr Diff-O-N =0) 

(lXjnH*l)ccrcincnt*/cro liorrow-Across-Sccond^Zcro) 

(DonH*l)ccFCincnt*Zcro 1*1 =;0*AftcrBorrow) 

(Don't* Dec rcmcnt-/jcro Dec rcmcni*One*To- Eleven) 

(Don I- Decrement* Zero- Until -Boiiom- Blank BonowAcross-Zcro-Ovcr-Zero) 

(DonVWrite-Zero) 

( f )oubIe- Dccremen t-One) 

{Double- Decrement-One Smaller- Fro m-Lai^ger- When- Borrowcd-From) 
(Forget-BorrowOver* Blanks) 

(Forget-Borro\v Over Blanks Borrow -Don VDecremeni-Top-Smailer Borrow-Skip*Equal) 
(Ignorc*Lcftmost-One*Over* Blank Dccremen t-AH-0n*Mu1tipIe- Zero) 
(N*N *Ca uses- Borrow) 

(N-N-l-Aftcr-Borrow 0-NrrN-Except-After- Borrow) 

(N-N^l-After-Borrow) 

(Sim pie- Problem-Siutter-Sub tract) 

(Smaller-From-Larger Diff-N-N=N Diff-0-N = 0) 

(Smaller-From-Larger Djf^O-N = 0) 

(Smaller-From-Larger-Except-Lasi Decrement- Ai1-On-Multiple*Zero) 
(SmaIler*From-Larger-fnstcad-Of-Borrow-From-Zero DifT-0-N=:N 

Snfialler-From-LaiTger-When-Borrowed-From) 
(Smaller From- Larger-Instead or Borrow*From*Zcro Borrow-Once-Then-Smaller^Frotn-Larger) 
{Smaller- From -Urger-lnsteadof- Borrow* Unless- Bot torn -SmaUer 0-N = N-Fj(CCpfAfter-Bonow) 
(Smaller* Frcm* Larger- 1 nsteadof-Borrow-Unless' Bottom-Smaller Top-Instead^-Borrow-From-Zero 

Diff-0-N=N) 

(Stops-Borrow-At-Zero 0-N = 0-Exccpt*AfterBorTOW Ms=0-AftepBorn>w) 

(Stops- Borrow- A t-Zero Borrow-Across-Zero*OverZero M = 0-After-Borrow) 

{Stops- Borrow- At-Zero 1-1 - 1-After-Borrow) 

(Stops- Borrow-At-Zero Diff-O-N:=0) 

(Stops-Borrow-At-Zero fgnorc-Leftmost-One-Over^Blanlc) 

(Stops-Borrow-At-2cro 0-Nr:0-After-Borrow) 

(Stops* Borrow- At- Zero Borrow-Once-Then-SmaHer-From-Larger) 

(Stops-Borrow*At-Zero 1-1 = 0*After-Borrow) 

(Stops-Borrow-At-Zero Difr-0-N=N DonVWriie-Zero) 

(Sub' Bottonn-Fronfi-Top) 

{Sub-Cbpy-Lcast* Bottom-Most-Top) 

(Zero-instcadof* Borrow) 
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Appendix 3 
Bug Occurrence Frequencies 



Ijiis appendix lii)ts each bug and coercion in the l>cbugg>*s database (sec the previous 
jppcndix fur an c\pKm.aK>ji i)f cuertions). It indicates bow many times the bug has occurred, if any, 
m l>cbLigg>'s amil>scs of the Sout}iba> data. ITie first column, labelled "alone" is the number of 
Umcs the bog occurred alotio, as the only clement of the diagnosis. The second column, labelled 
"cmd/' is the number of times tlie bug occurred as part of a multi-bug diagnosis, or "compound" 
bug as \{ was called in (Brown & liurtuii, 197S). The third eolumn, labelled "gen,**, has a *'$" mark 
if the bug was generated b> Sierra during the Southbay run. 'ITius, for example, the bug 1-1=0- 
After-liorro^v occurred once alone, and .se\en times as part of a larger diagnosis, but is not one of 
the bugs that Sierra predicted, Ro^vs th^: would be all zeros have been left blank to highlight those 
bugs in the data base \ hich never occurred in these studies. The data come from the reanalysis 
that wds pcrfonned after the new bugs generated b> Sierra were entered in the database. There are 
12S hugs and 15 coerci./ns in the database. Of these, 15 bugs and 5 coercions occurred at ica^i 
once. Sierra generated 49 bugs and 6 coercions. 

alone cmd. gen. Coercion ^ 



! Borrow Diff' A bS'Over-BlanV 
fForget'To'Write-UnitS' Digit 
ILast'Column-Spccial'Sub 
Ilast-Full-Column-Special'Sub 
!N-0 = N-Always 
0 2 lOnly-Write-Units-Digit 
0 1 rSub -Units-Special 

0 3 rWrite-Left-Ten 

fZero-Minus- Blank-Is- Zero 
$ !Touched-0-N = 0 
$ !Touched-0-N = Blank 
0 4 $ !Touched-0-N = N 
0 6$ JTouched'O-IS'Tcn 
S !Touched-0-Is-Quit 
$ rrouched'Dotible-ZerO'Is-Quit 



alone cmd. gen. Bug 

? 3 0 N =: 0-A ^^erBorrow 

0 2 0'N = 0-Kxcept Afler-Borrow 

1 6 0-N-Nv\fterBorrow 

4 7 0-N = N-Excepl-Aftcr- Borrow 
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BUG Occurrences 



1 7 M = 0*Aftcr-BorTow 

0 2 l-l = l-After-BorTDW 

Add-Borrow-13ccnTOCnt 
Add-IioiTOw-DccrcmcnE-Without -Carry 

1 0 Add-)nsicador-Sub 

1 0 Add-Lr-Dccrci7icnt-Answer-Carry-To-Righl 

Add-Nocarry-fn^>tcadof-Sub 
Always- Borrow 
6 0 $ Always-Borrow-Left 

$ Blank- fnstcad-Of-Borrow-From-2^cro 
0 1 $ Blank-Insieadof-Uorrow 

$ Blank-Insteadof-Borrow-Exccpt-Last 
$ *BIank-Insicadof-Borrow-FEpm -Double-Zero 
$ Blank- hsieadof-Borrow-Unless- Bottom-Smaller 
$ *Blank*With-Born)w 

2 5 $ Borrow-Across-Sccond-Zero 

, 2 0 Borrow-AcEOS$-Top-Sma!ler*Decrementing-To 
IJ 29 5 Borrow-Acro^-Zcio 

0 9 Borrow-Across-Zcro*Over*Blank 

1 13 Borrow-Across*Zero*Over^Zero 

$ Borrow-Add-Dccrcmcnt-Insteadof-Zero 
Borrow-Add-Is-Tea 
Borrow-Etecremeoting -To-By-Extras 

2 1 Borrow-DonVDccrtmentTop-Smaller 

2 2 $ Borrow -Don VDccrcineni-Unless-Bottom-Smalicr 
1 0 Borrow-From-All-2^cro - 

Borrow-From-BoUom 
0 1 BorroiV-From-Bottom-fnsleadof-Zero 

Borrow-From-Laiger 
0 2 $ Borrow -From-Onc-Is-Niae 

0 1 $ Borrow-From-One-Is-Tea 
10 4 $ Borrow-From-Zerrj 

1 1 Borrow-From-Zeio&Uft-TeaOl: 
1 1 $ Bonow-From-Zero*Is-Ten 

Borrow-Ignore-Zero-Over*Blank 
0 5 Borrow-lnio-Oae=Teo 
10 8 $ Borrow -No-Decremeot 
4 2 S Borrow-No-Decremcnt-Except-Last 

0 12 BorrowOoce- Tien-SmailcrFrom-Laiger 

BorrowOnce-Without*Recurse 

1 3 Borrow-Ooly-From -Top-Smaller 
0 1 Borrow-Only-Once 

t) 4 Borrow-Skip-Equal 

Borrow-Ten-Plus-Next-Digit-Into-Zero 
0 1 $ Borrow-Treat-One- As-Zero 
0 1 Borrow-Uoii-Dirr 
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Bug Occurruncis 



Borrow AVonhRccursc 

Uorrovi-WonhRccorsc-Twicc 

Bt^rrowcd- From* Don't- Borrow 

Ohi i-Siibtract 

Copy -Top- HxccphU nils 

Copy-rop-ln*lxj;>(*CoUimn*1f-Borrowcd*Froni 

l)ccrcmcni*All*On*Multip1c*Zcro 

I decrement* By *Onc*Plu$*7jcro$ 

l>ccrcmcni*By- rwo*OvcrTwo 

Occrcmcnl*Lc ft mo5i* Zero-Only 

Occrcmcnl*MuUiplc-Zcros*By*Numbcr-ro*i.cft 

I)ccrtmcni-MoUiplc*Zcro$-By-Nurnbcr-To-Righi 

Decrement On-First- Borrow 

Dccremenl-One-To- Eleven 

Dccremenl-One-To-Eleven-And*Conlinue 

I>cremenl*rop-Leq-Is-Fjght 

DirF-0-N = 0 

DirF-0-N=:N 

Dirr-O-N = N-When-Borrow-From*Zero 
DifT'l-N = l 
DirF-N-0 = 0 
DirF-N-N = N 

Doesn't- Borrow*Fj(cep I- Last 
Doesn't*Borrow-UnIess*Bottom*SmaJter 
Docsnt- Borrow 

Don' t-Dccrement-Sccond -Zero 

Don'l-Decremeni-Zero 

Don H-Decrenient* Zero-Over- Blank 

Don't-Decremeni*Zero-Unti1-Bouom- Blank 

Don't-Write-Zero 

Double-DccrGment-One 

Forge t-Borrow-Over-BIanks 

Igno re- Leftxnost-One-Over Blank 

Ignore-Zero-Over Blank 

Incremenl-Over- Larger 

Increment-Zero*Over- Blank 

Mix-Up-Six-And-Nine 

N-9 =r N- 1-After- Borrow 

N-N-After-Borrow-Causes* Borrow 

N -N -Causes* Borrow 

N-N = l-AftcrE10frow 

N-N = 3-Plus-Decrement 

Once*Borrow-AlwayS'Borrow 

*Only-Do-Firsi&Lasi-Colunins 

Only-Do-Unils 

*Only-Do-Unit£&Tens 
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liUG OcCUKUf'NClS 



$ *Only'Do'UnitS'UiilcssTwo'Coiumns 

0 5 $ Ouit'Wlicii'iJoUorii'HIanlc 

$ *Otni-Whcn-Sccc)nd-RoUom-)Jlanfc 

1 0 SmiplcProblcm-Stuucr-Su^^tracL 

$ *Sfcip' in tcrior'lJoilom- Blank 

103 12 S Sinallcr-From-Urgcr 

0 3 $ Sintillcr-hrom'l.argcr'pjtccpi'Lasi 

0 5 $ Smallcr'From'Largcr-Insicad'Of'lJorrow'From-Zcro 

$ SmjIlcr-Froni-i^rgcr'IriSlcadof'Iiorrow-From'DoublC'Zcro 

2 5 $ Siiitjlicr'Froin'Largcr-fnsicadof-IJorrow-UnlcsS'Bouom'Sniallcr 

0 7 Smallcr'From'l^rgcr'Whcii'liorrowcd'From 

$ Smaller- From-Largcr- With- Borrow 

2 1 $ Stops-Rorrow'AL-Multiplc-Zcro 

$ Slops liorrow'AL-Sccond' Zero 

34 30 $ SlopS'Rorrow-At-Zcro 

2 0 StnUcrSubtracl 

1 0 Siib-BoUom-F^ronri'Top 

1 0 Sub'Copyl -cast' Bouom- Most-Top 

0 2 Sub-Onc-Ovcr- Blank 

$ [oD'AftcpBorrow 

0 1 $ 'lop'Instcad-Of-IJorrow-From'/^iD 
Top'Instcadof'Borrow 

S lop'InsEcadof-Borrow-ExccpL'LasL 

$ Top'InsLcadof'Borrow-From-Doublc'Zero 

$ Top- 1 nsicadof'BoriDw-Unlcsv Bottom 'Smaller 
Treat 'Top'ZcrO'AS'Nine 

0 1 TrcaE'Top'ZerO'AS'Ten 

0 1 X'N = 0'Aftcr'Borrow 

0 1 X'N-N'After Borrow 

Zcro-Aftcr-Bonow 

Zero- Instcad-Of' Borrow- From-Zcro 

1 4 ?;erO'lnsteadof'Bortow 
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Appendix 4 
Sierra's predicted bug sets 



[his apf)oridi\ !i>t- .ill rjic hu^ si^ts i^cncraicd b> Sicrrj for the Smilhbay cxpcnmcnL 'llicrc 

(rioij(,hcd O N --0 lJurrow ,\i.russ-/cn> l \ uiichcd-Doublc /cro-ls-Quil N-N-Causcs-IJorrow) 
(! I ouiwhod 0-\ - Bljiik [Jornm "Acniss /crn *Quit'W!)cn-Second-IJoUom*IJIank) 
niuuLiicd-O'N - fikink [iufiuv.-ALru^s-/cn> rroiiLhcd-l)uuble-/cro-ls-Quil N-N-Causcs-Rorrow) 
(f loijchcd-O-S ~N I iorrow -Across* Zero -^OLiil'Whcn-Sccond-BotlonrBlank 

! \ oiic])cd-i )oul)lc*/cR)-ls*QuU) 
(!louchcd 0 N = N IJuiio\^ A<.robb Zero !! uuchcd-Ooublc-/crt)*[s*Quil N*N-Causcs-Rorrow) 
(M"ouchcd-0-Is-QiJU liorrcn^-Across'/cro liorrow-Wont*Kccursc*Twicc) 
(! l\HJclicd-0-Is-OiJit liomm-Across-/cro *Quit-Whcn-Sccond*Bolloni*Rlank) 
{|l{juchcd*0*Is*Quit Rorrov^ -Across */cro N*N*Causcs*l!orrc>w) 
(!loocticd*0'K-Oim Rorron -yXcross-Zcix) *Skip-lnicrior*RoUoni-Blank) 
t>uchcd-0-h*Icji R(»rn>\\*Across*/cr<) Roriuw-Woni-Rccursc-Twicc) 
(! I tniclicd-0-IS" I cn [ioriO\^*Acrosb /cro *Qinl*Whcn*Sccond*I!oltom*Blank 

!ruuchcd-l)oublc-/ero-)s-QuU) 
(!Iuji,hcd-0-h-lcn I!orro\\-AcroSb-/cro iToiichcd-Ooublc-Zcro-Is-Quu N-N*Causcs-Rorrow) 
(Rl<ink-lnsio;id*Of-I!orrow-I*rom*Zcro) 
(Blank'lnsiead*OM^oriow-I''rom*/cro N*N*Caiiscs*)iorrow) 

OJIank lnsicaduf-Ruirtn\ *Onl> Dcj-LEnils& I ens *0nly-Oo-Unils-Unlcss-"rv o-Coliimns) 

(JJlank■l^slcaduf■Rofruv^ *Onl> l>u-I irsl&La&t-Cutumns Quit-Wlicn*Boilom*Blank) 

(Blank*lnsicador*Rorr(n\ *Only-Do*Kirsl&Lasi*Colunins) 

(Blank- lnslcad<>f'Rorrow'!-,xccpl*Last) 

(*Rlank-lnstcadof-Borrow-Krom-l)oublc*Zcro) 

(R)ank*fnstcad<)f*Borro\\-Unlcss*Bolloin*Sinallcr) 

(*Rlank*Wiih*Borrovi Bbnk-lnbicad-Of-Borrow-From-Zcro) 

(^irmk-Wttli-RinTow iiorrow-Woni-Rccursc) 

(*Bljnk*Wrth-B<>rro\\ Horrow*A(ld*l)ccrcincnl-Insicadof-Zcro) 

(*Hlank-Wiih*R(>rrow lop-ln^ilcad-Of-Borrow-Froni-Zcro) 

(*Blank-Wjih*Rorrovi Smallcr-l-'rom-l^rgcr-Insuad-Of-Borrow-Kroin-Zcro) 

(Borrow-Acrobs-Scccmd*/cro) 

(Borrow-AcrossV.cro Borrow-Wonl-Rccursc-Twicc) 

(Borrovi-Across*Zcro nbuclicd*Ms*Quil) 

(Borrow- Across- Zero !'l'ouchcd-l)oublc-/cro-ls-Quii) 

(Rorrow-Acrt)Ss-/cr() lluucbcd-O-ls-Tcn !Touchcd*Doublc*Zcro-Is*Quit) 

(l!orrow-Across*Zcro ! louchcd-O-Is-Tcn) 

(Borrow -Across- Zero) 

(Borrow-Across-/cro riuuchcd-O-N = RIank ITouchcd-Doublc-Zf^ro-h-Quit) 

(Borrow-Across-Zcro riouchcd-0-N=0 !Touchcd-[;)oublc-/<;ro-[s-Quii) 

(Horrow-Across-/cro !'louchcd-0*N = N !rotJchcd-l5oublc-/cro-Is-Quil) 

(Borrow- Across- /cn» *Qtnt-Whcn-Sccond-Bouom- Blank ITouchcd-Duublc-Zcro-lsQuil) 

(Borrow- Across- Zero ! louchcd-Double-Zcro-ls-Quji N-N -Causes- Borrow) 

(Borrow- Add- 1 )ccrcincnt-Insicadof'/cit)) 

(fk)rrow A(Jd-l)(xrcncni-lns(cadof-/cro *Qini-Whcn-Sccoiid-Bouoni-BIank) 

(lk)rrow-A(ld*l Xxrcmcni lnsicadof-/cro *Skip-[ntcnorRolioni-RIank) 

(Borrow-Add-IXxrcTiicnt-lnstcadof-Zcro N-N-Causcs-Borrow) 

(Borrow- KronrOnc- Is- Nine Borrow-I*rom-Zcro) 

(Borrow-h'rom-Ono-ls- I cn Borrow-l'rom-Zcro-ls-Tcn) 

( Borrow- J-Vom -Zero) 

(Borrow- I'Vom-Zcro-ls- Ten) 

(Borrow-No*I)ccrcnicni) 
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PRI IMCII l> IlUG SKIS 



(liurnm*Nt)-l A:cranctu-I^xccpl-I.jsi *Skip*lnlctn>i*liuttom*Blnnk) 
(li()inm-No-IXH:rancTU*hxccpi*I cM) 

(liorrnw-N(>-|>CLrcinent-l:xt.cpl-L.ist •Quit- When- Sccond-liotlojn-IMank) 
f lit)jfo\^- 1 rojl*One*A^'/cro) 

(B4jfro\^'Woru-kocLirbC *Onl\-l)o-Uniis-Un]css-'I wo-C>liJinns) 
(U^)i^)^v-Wunt*Kecurbe) 

(B4)rio\^*Wtnu-kocuiv *Onl>*l)t)-I''irsE(SiL;iSi-ColLmiiis) 
(llorrou-Wiiru-IU'ciirsc *Ouit*Whcn*Second*ISottt>ni*IJtank) 
{li()rro\\-W(>ju-Kcciir^c *Skip-lnEciior*Botu>in-hl<ink) 
(Burio\^-W{>nt*l<cciirsc N*N*Caiises*liorrow) 
(liorro\^*\V\>nt-kccorsc Smaller- l*rom-Uirgcr-Wiih-norrow) 
(l*ort)\^-W(}nt*Kaurse* Twice) 

(l)ocsn't*Borrow-I:xccpi-l.ast *Onlv-l)o-liniis*Uiilcss*1 wo*Columns) 
(lX)CMVi-lSurruw*l:xccpt-l^is£ *Oni>*l)o-Urms&1ctis Doesn't- Burro\^-Unlcss-IiolU)m-Sniallcr) 
(l)oesirt*Borro\\'l'Accp£*Las£ •Only*l)o*FiiM&Lasl-Cu!uinns) 
U)ocsn't-l)orn)\\-i;stccpi-|^isi *Only*I)o*t*ir$t&Last -Columns Copy-Top-Hxcepl-Uniis) 
(I)t>csnVlSt)rrow-l:xcepl*I^si) 
(Docsnl-liorrow -Unless* Bo ttom*SmalIcr) 
(l)ocsn I* Borrow) 

(lX>csn I* Borrow •Only*Oo*Uniis&Tens) 

(l)t>cbni-Borrow Blank*lnstcad*Of*Borrow-From-Zcro) 

UXxjsnhliorrow *Only-i:>o-Kirst&Last*Culumiis Ouit*Whcn*l?ottom*Blank) 

U)ocbn I- Borrow Smal!er*Frotn*Ix)rgcr*Instead*Of*Borrow-Froin*Zcro) 

'Ooesiu- Burrow *On!y*I^o*Firsi&Last*Columns) 

(t)on'i*l)c<. reinent*Zcro) 

(rorget-Born)\^ -Over-Blanks Bon ow- Don' l*Dccrcmciit- Unless* Bono m*Smallcr) 
(lncremcni-Zero*Ov CP Blank) 

(*Oiil>*l)()*l"irsi&l.ast*Columns Borrow-No-Decrcment'Exccpi-l^st) 
( 'Onb -Oo'First&UshCoIumns) 

(*Oniv-l)t>-FirsE&I.asi*CoIiimns Docsn t-Borrow-Fxcepi-l^st) 
(*Oji1> I)o First&I^bl-Cotumns Smaller From-I^rger Ouit*Whcn-IJouom*BIank) 
(*Onl>*l)(rFirst&Lasi*Cuhimns SmiillerFrom-Urger) 
(*Onl>*l)o-lMrst&Last*Columns Smallcr*F*rom*Largcr*F*xccpt*Last) 
(*On!y*l)trFjrst&Lisl*C()lumns niank-tnsieadof*Borrow-Kx:epl-Ust) 
(•Onl>*I)o-I irst&Last*Columns Always* Borrow-Left) 
fOnl>*[)o*Uniis Blank-lnsteadof-Borrow) 
(*Onl>-i)o*Unt£S&rcns Blank*lnstcadof*Borrow) 

(*Onl>-l)u-tnitsAlens •Onl>-Du Unils*Unless-Two-Columns Smaller- From -1 ..ger*Exccpl*l^si) 

(*Onl>-l)o-l;nus-Unless*Two*CoIumns) 

(•Only-l)o-Unjts-Unlcss- 1 wo-Co''J.nns Copy* Top-Hxccpi- Units) 

(*OuJt ^Micn-Sc(.tmd-Bolu>m- Blank Smallcr-From-Largcr-lnsiead-Of*Borrow-From-Zcro) 

('OuiE-WhenSccond-lloUom-Hlank Blank* I nsicad*Of*Borrow-I-rom -Zero) 

(*Quit*When*Sccon<l-Bottom*Blank) 

(•Quit-When-Sccond-Botlom-Blank Slops* Borrow- A t-Zero) 

(•Skip-lntcnor-Bottoin-Blank I)ocsn'i*Borrow-FxccpM.asi) 

[^SkipHnichor-Botlom-Blank Smallcr*From*I>arger*Exccpl*l^si) 

[*Skip*lntcnor-BoEt<)m*Blank I?lank*Instcadof*Borrow-Excepl*Lasi) 

[•Skip*lrjtcrior-Buitoni BLmk Smatlcr-From-Urger-Insiead-Of-Borrow-Fron-Zcro) 

;'Skip-lM£cnor*Bottom*Blank Blank*Instead*Of*BorfDW-From*Zcrc) 

'*Sk(p-hucnorBoaom-Blank Slops-Borrow-Al-Zcro) 

ISmaller i rnm I^trgcr *Only"I)o*UnJts*Unlcss-Two*Columns Ouii*Whcn-BoUom*BIank) 
;Sm<illcr*FrOin*lAirgcr*I*xccpM^st) 
;SmaHcr-Fr«m*Uirgcr*lnstcad*Or-Tlorrow-From-Z^ro) 
'Sm<iller*lTom*I>argcr*lnsicad*Or*Borrow*F>om-Zero N*N*Causcs* Borrow) 
;SmaBcr*From*l^rger*Insicadof*t?orrow- From* Double* Zero) 
;Smaller*I*rom-Uirgcr-fnsteadof*Borrow-UnIess*Bottom*Smallcr) 
Smaller-I^rum-Ixirigcr-With-Borrow SmaBcr-From-l^rgcr-Instcad-Of-Borrow From-Zcro) 
;Smallcr-From*l^)rgcpWith-Borrow BIank-!nsicad*Of*Borrow-From-7cro) 
;SmalIe r* From* Larger* WiUi ilorrow Borrow-Add-Decrcmenl-insleadof-Zcro) 
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(Smaller- KftmrlT;i?BQP-With*norrow Top-Instcad-Or-IJorrow-Krom-Zcro) 

(Slops- IJorrow'At-MuUipIc- Zero) 

(Siops-Horrow-At-Sccond'/cro) 

(Slops* IJorrow-Ai-/cro) 

(Stops- IJorrow-Ai- Zero N-N -Causes- Borrow) 

( top-Aftcr-IJorrow Stops-Bomiw-Al-/cro) 

(Top-Aftcr-IJorrow IJorrow-Wont-Rccursc) 

(Itip-Aftcrllorrow noriow-Add-Occrcmcnt-fnsicador*/cro) 

{ rop-1nstcad-Of*IS4>rrow-|*"rom-Zcro) 

rrop-instcad-or-lJorrow-Krom->;cro N-N-Causcs-IJorrow) 

(Top-Insicador-Ilorrow-Fxccpt-I^st) 

( rop-1nsicadof-IJorrow-From-l> ublc-Zcro) 

("lop- 1 nsicador- Borrow- Unlcss-1k)iEom-Sma11cr) 



V 
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Appendix 5 

Observed bug sets, overlapped by predicted bug sets 

I his tjkcs the 134 bug scLs that occurred in the Suulhbay data and lists them in three groups, 
I he HrM group contain^ H bug sets that are identical lo some bug set in the set of Sierra's 
predicted bug sets, 'llie second group contains 47 bug sets t]iat contain only bugs that Sicira cannot 
generate ^Ilic third grv)up coniains 76 bug sets tliat have non-empty intersections with at least one 
bug set from Sierr.i\ predicUon^, Iliis third group of bug sets is printed a little difTerently, The 
interscctitm is on one line, surrounded by parentheses, and the rest of t]je bug set, if any, is on the 
next hne, 'Ilius, if {A B C} and {C} arc observed bug sets, and {C D} Is in o predicted bug set, 
llien the observed bug sets will be printed as; 

((C) 

(AB)) 
((C) 

) 

1 he other bug sets are primed in the usual way, as parenthesized lists. 

Observed bug sets tliat are predicted by Sierra 

34 occu rrences 
("Stops-Jorrow-At-Zero) 

13 occurrences 

( Borrow- Ac ross-Zcro) 

10 occurrences 

{Borrow-Krom-Zero) 

^Borrcw-No-Dccrement) 

6 occurrences 

(Borrow- Across- Zero !ToMched-0-Is-Ten) 
4 occurrences 

(Bl row- No- Decrement' Except' Last) 

3 occurrences 

(Don t- Decrement-Zero) 

2 occurrences 

(Borrow- Across- Second- Zero) 

(Smaller-Krom-Karger-Insteadof-Borrow-Unless- Bottom-Smaller) 
(Stops- Borro w-A t- M ul tiple-Zero) 

1 occurrence 

(Borrow- Fro m-Zero-Is-Ten) 
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Ob.>crvcd bug sots that huvc no itrcdictcd bugs in them 

6 occurrences 

(fJorrowAcross-Zcro-Ovcr-Zcro Borrow- Across- Zcro-Ovcr^nianJc) 
4 occurrences 

(0-N^N-l':xccpt-Aftcr-Horrow) 
{DifT-0-N = N /cFi)-^nstcadof-n()rrow) 
(DonVlDccrcmcnt' Zero-Over- Ulaiik) 

3 occurrences 

(Dccrcment-All-On-Multiple-Zero) 
(Dccrcment-Mu1iip1e-Zeros-ny-Number-To-RighO 

2 occurrenc es * 
(0-N = 0-After-norrow) 

( Borrow- Across- rop-Smaller-Dccremen ting-To) 
( Bo rrow-Don*t-Dccrement-Top-S mailer) 

(Borrow -Oiil>-1''ri>mrop-Smdllcr Borrow-Across-Zero-Over- Zero Borrow-A cross- Zero -Over Blank) 
(Stutter-Subtract) 

1 occurrence 

(!Only*VVrite-Units-DigU N-N -A fter-BoiTg|p-CaUbes- Borrow) 

(0-N-N-After^Borrow Borrow-A cross- Zero-Over-Zcro Borrow-Across- Zero-Over Blank) 
(0-N = N-AfterJJorrow) 

(0-N"N-Kxcept-After-Borrow 1-1-0- AfterBorrow) 
(0-N=N-Bxcept-AfterIJorrow l-l = l-After-BoiTOw) 
(l-l=0-After-Borrow) 
(Add-Insteadof-Sub) 

(Add-Lr-Dccrement-AnswerCarry-'fo-Right) 

(Borrow- Across- Zero-Over Zero 0-N=N-Kxcept-After-Borrow 1-1 =:0-After Borrow) 
( Borrow-A cross-Zero-Over-Zcro) 

(Borrow-From-AII-Zcro) > 
(Borrow-From -Bottom- 1 nsteadoF-Zefo DirF-0-N"N) ^ . 
(Borrow-From-Zero&Left-Ten-Ok 0-N = N-After-Borrow)^"' ■ 
(Borrow-From-Zcro&Left-Ten-Ok) ^ 
(Borrow-fnto-One=Ten Dec reinent-MuIti pie-Zeros- By-Number-To-L^ft)- v " 
(Borrow-lnto-One = Tcn Dec rement-Mu1tip1e-Zeros-By -Number-To- Rigtit Rorrow-Across-Zero- 
OvcrZero) 

(Borrow-Only-From-Top-Smaller) 

(Borrow-Only-From-Top-Smaller 0-N = N-AflerBorrow) ^ 
(CanVSubtract) 

(Decrement- All-On-Multiple-Zero Double-DecrcmentOne) 

(Decrement- I-cftmost-Zero-Only) 

(Decrement- Multiple-Zeros- By-Number-To-Left) 

(Decrc ment-Top-Leq- f s-Eight) 

(DirF-0-N = N) 

(DifrO-N = N DirF-N-0;^0) 

(DoriVDecrement-Zero-Until- Bottom-Blank Borrow-Across-Zero-Over Zero) 

(DonVWrite-Zero) 

(Doublc-Dccrement-One) 

{Double -Decrement-One Smaller From -Larger When-Borrowcd-Fiom) 

(fgnorc-Leftmost-Onc-Over-Blank Decrement- All-On-Mu I tiple-Zlcro) 

(N-N=l-AfterBorrow 0-N=N-Except-After Borrow) 

(N-N = l-After-Borrow) 

(Simple-Problem-SiutterSubtract) 

(Sub-Iiotiom-From-Top) 

(Sub-Copy-Lcast-Bottom-Most-Top) 

{Zero-lnsteadof- Borrow) 

253 
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OlJScrrcd bug sets iUui ovcrlup sonic predicted bugsct 



103 occurrences 
((Smaller- Krom -Larger) 
) 

7 occurrences 
((Stops-lJorrowAt-Zcro) 
Diff-0-N = N) 

6 occurrences 
((Always- Uorrow-l jcft) 
) 

((liorrow-Across-Zero) 

Diff-0-N=N) 
((Stops-Borrow-At-Zcro) 

Bon-ow-Once-Then-Smaller-From-Larger Difr*0-N=N) 

**■ 

Soccurrcnces 
((liorrow-No-Decremem) 
Diff-O-N^N) 



4 occurrences 
((liorrow-Across-Zcro) 

Diff-0-N=0) 
((Qu it- When^Bottom- Blank 
) 



SmaIler*From-Larger) 



3 occurrences 

((Stops- Borrow- A t*Zero) 

Borrow -Into^On e = Ten) 
((Smaller-From-Larger) 

fgnore-Leflmost-One-Ovet-Blank) 



2 occurrences ' | 
(<Jiorrow-Don't-Dccrement-UnIess-Bottom-Smaffer) 

) I 
((Smaller-From-l^rger-Instead'Or-Borro^-Frorri-Zcro) 

Borrow-Once-Then-Smaller-From-Lamer DifT-O-N^N) 
((Stops-Borrow-At-Zcro) 1 

Smaller^From-Larger-When-Borrowed-^rom) 
((St(ips-Borrow-At-Zcro) 

Difr*0-N=:N Smaller-From-Larger-Wfien-Bonowed-From) 

!■ occurrence 

((Stbps-Borrow-At-Multiple-Zcro) 

rCjnly-Write-Units-Di§it N-N-After-Borrow-C^uses-Bonrow) 
((Borrow-Across-Zero) 

!Sub-Units*Special Smaller-From-Laiiger) 
((Smaller-From-l^rger) 

!AVrite-Left-Ten Diff-O-N=0) 
((Borrow-Across^Second'Zero) 

!Write-Left-Ten Diff'0-N=N) 
((Forget-BorrowOver-Blanks) 
!Write-Left-Teii Difr-0-N=N) 



((lTouched-0-N=:N Borrow-Across^Zer)) 

Diff-N^=0) 
((!Touched-0-N=N BonowAcross^Zerp) 
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noiTow-Oncc-Thcn-Sinallcr-rrom-Largcr) 
((!Touchcd-0-NxN ltorn)w-Acrt)SS-Zcro) 

lk»rFow-Across-Sccond-/cro SmjIlcr-From-Largcr-WhcivBorrowcd-Fmni) 
({Sm;jUorl-rom-l.argcr-liisicnd-Or-liorrow-l->om-Zcro) 

0-N = N-AftcrlioiTow N-N = 1-Aftcr-liojrow) 
((liliiiik-lnstcadof-Jiorrow) 

l)iff-0-N = N) 
((li(>rrow-Acrt>ss-Sccond-y,cro) 

Doii't-Wiitc-Zcro) 
((liorrow-Across-Sccond-Zcm) 

liom)w-Skip-[:gLial) 
((liornnv- Across- Zero) 

0- N = 0-Kxccpt-A ftcrBorrow) 
(( I iorrow- Across- Zero) 

1- I^O-Aftcr-Borrow) 
((Borrow-Across-Zcro) 

IJorrow-Oncc-"I"hcn-Sma11cr-From-Urgcr ON=N-Bxccpt-Aftcr-iioirow) 
(( Borrow- Across-Zcro) 

Sub-Onc-Ovcr-Blank 0-N==N-Aftcr-Borrow 0-N^N-Exccpt-Aftcr-Borrow) ^ 
(( Borrow-Across- Zero) 

liurrow-Skip-W-qual) 
(( Borrow- Across-Zcro) 

Quii-Whcn-BotLOm-Blank 0-N = 0-Aftcr-Borrow) 
(( Borrow- Across-Zcro) 

Forget- Borrow-Over- Blanks DirF-0-N=:N) 
(( Borrow' Across-Zcro) 

Ignorc-l-cftinost-Onc-Ovcr-Blank Borrow.- Skip-Equal) 
{( Borrow- Across-Zcro nouclicd-0-N = N) 
) 

((Borrow-Don't-Dccrcmcnt-Unlcss-Bouom -Smaller) 

X-N = 0-Aftcr-Borrow) - 
((liorrow-l>)n't-Dccrcmcnt-Unlcss-Bottom-SmaIlcr) 

Don't-Writc-Zcro) 
((Borrow-From-Onc-!s-Ninc Borrow- From -Zero) 

Difr-O-N = N -When- Borrow- From-Zero) 
((Borrow-From Onc-ls-Ninc Borrow-From-Zero) 

Don't- Dccrcmcnt-Zxro-Ovcr^ Blank) 
((liorrow-Fron-Onc-Is-Tcn Borrow-From-Zcro-ls-Tcn) 

Bc;rgw-Only-Oncc) 
((BorroW-J^'rom-Zero) 

0-N:::0-^cr^Borrow) 
(( Borrow- From-Zcro) 

0-N = N-Aftcr- Borrow) — 
((Borrow-No-Dccrcmcnt) 

13irF-0-N = 0) 
((Bonow-No-l>ccrcmcr t) 

Smallcr-From l^rgC; Except Ust SmallcrFrom Larger !nstcadof*Borrow Uijlcss Bottom Smaller 
DirF-0-N=N) 
((Borrow-No-Dccrement) 

Sub-Onc^Ovcr^Blank) 
((Borrow-No-Dccrcmcnt-Exccpt-Last) 

Trcat-Top-Zcro-As-Ten) 
((IJorrow- No- Decrement- Except- Last) 

Decrement-Top-Lcq-ls-Eight X-N=N-A"ftcr-Borrow) 
((Borrow-Trcat-One-AS-Zcro) 

N-N= 1-Aftcr-Borrow Don't-Dccrcmcnt-Zcro-Over-Blank) 
((Only-Do-Units) 

Borrow-ynit'Difi) 
((Stops- Borrow- A t-Zcro) 
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DirF-0-N=0 l)iff-N-0 = 0) ' 
((DocsiiVBonow-Kxccpi-l^si) 

f)ifT-0'N = 0 DifT-N-O-O Smallcr-Kroni Kargci^Instcndof-Borrow-Untcss-BoUorn-Sniallcr) 
((Smaller- Kmm-l ,iirgcr-Hxccpt-i .asi) 

l)ifT'0-N=0 l)iff-N-0 = 0 Sm<i1lcr KrnnvUrgcr-lnsicadof Borrow-Unlcss-liouom-Smallcr) 
((l)onVI)cci'cmcni-Zcro) " 

l)ifT-0'N = N-Whcn'IJorrow-Froin-/cro) 
(( Smaller' Hroin - Larger) 

I)ifT-N-0==0 DifT'O'N^O) 
((liarrow-Across-Sccond'Zero) 

1)on*l'I)ccremenl-Zero) 
((Don't-Dccrcmciu-Zcro) 

i-l = 0'Aaer-IJorrow) 
((Oon>Occremenl*Zero) 

I )ccremeiU'One-'ib' Eleven) 
((I'orget- Borrow-Over- iJlanks) 
) 

_((fcbrgct-Horrow-0^erT Blanks) - 

Borrow-1)on*i-1)ccrcmcnt-'rop'Sma11er Borrow-Skip-Equal) 
((N-N-Causcs- Borrow) 
) 

((Smaller-Frum-i^rger) 

DirF-N-N = N DirF-0-N=0) 
((Smaller- From-Larger) 

OifT'0'N=0) 

-((Smaller- Frorrr-rafgcr- Rxcept-I ^st) 

Decrement- Ail-On -Multiple-Zero) 
((Smaller- From -l^rgqr-lnsiead-Of'Borrow'From-Zcro) 

DirF-0-N=N SmallerFrom-f^rger-When-Borrowed-From) 
((Smaller-From- Larger- Insiead'Or- Borrow- From- Zero) 

Borrow-Once-llien-Smatler-From-Larger) 
((Smaller-From-Larger-fniiteadof-Borrow-Unless- Bottom-Smaller) 

O'N = N -Except- A fie r-Borro\v) 
((Top-Instead-Of- Borrow- From-Zero) 

Smaller- From- Larger Insieadof-Borrow-Unless- Bottom- Smaller DjrF-0-N==!N) 
((Stops- Borrow- A t-Zero) 

0'N = 0-Except-After-Borrow l-l==0-Aftei^Borrow) 
((Stops- Borrow- A t-Zero) 

Borrow -Across-Zero-Over-2jcro l-l = 0'After-Borrow) 
((Stops'Borrow-At-Zero) 

l-l = l-After-Borrow) 
((Stops- Borrow -At-Zcro) 

DirF-0-N = O) 
((StopsMlorrow-At-Zero) 

Ignore-Leftmosi-Ohe-Over-Blanit) 
((Stops- Borrow- At- Zero) 

0'N=0-After-Borrow) 
((Stops-Borrow-A t-Zero) 

Bo rrow-Once-Then-Smaller-From -Larger) 
((Stops- Borrow-At-Zcro) 

1-1 = 0-After ^lorrow) 
((Stops-Borrow-At-Zcro) 

DirF-0-N = N Don>Write-Zero) 
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Appendix 6 

Predicted bug sets, overlapped by observed bi 



g sets 



Tliis uikcs the 119 bug sets that gcncraLcd by Sierra Tor the Souihbay cxpfcrinicnl nnd 
lisLs them 111 tlircc giimps. The fii'st groLi[) conttiins 11 bug scis ihat arc identical to st>mc observed 
bug set rhc second group contains 40 bug sets that coniain only unobserved bugs, 'ilic third 
group cont^Uns 68 bug sct^ ihai ha^c non cmpiv intersections when inlcrcscctcd with at least one 
observed bug set. This ihirJ group of bug sets printed a little differently, ^llie intersection is on 
one line, surrounded by. parentheses, and tlic rcsc of the bug set, if any* is on the next line* llius. ff 
{A B C} and {C} ^re predicted bug set, and {C D} is an observed bug set> then the two predicted 
bug set wilt be printed as: 

({C) - - 

fAB)) 
((C) 
) 

*llie other bug sets arc printed in the usual way, as parenthesized lists, 

Prcdieted bug sets* identical Xo some observed bug set 

(Stops- Horrow -At- Zero) 
(Borrow-Across-Zcro) 
(Bo rro w -Fro m-Zero) 
(Borrow-No -Decrement) 
(Borrow-Across-Zcro !Touehed-0-ls-Ten) 
(Bo rrow -No -Dee reme ni-Excep t-Last) 
(Don't-Dccrement-Zero) 
(Bor row-^Across-Sccond-Zc ro)' 

(Smaller-From-Larger-lnsteadof-Borrow'Unless-Bottom'Smaller) 

(Stops-Borrow-At-Multiple-Zero) 

(Borrow-From-Zcro-Is-Tcn) 

Predicted bug sets with no observed bugs 



(Blank- Insiead-Of- Bo rroW'From-Zero) 
(Blank-lnstendof- Borrow- Exeepi-I^si) 
(♦lilarik-Insieadof-BorrDw-From-Doublc-Zcro) 
(Blank-lnsteadof-Borrow-Unless-Boitom-Smallcr) 

(*Blanlt=With^B(jrrow' BlanHnsteadOf-Borrow-From-Zcro)— — -- - 

^*Blank'With'Borrow Barrow-Woni-Rccurse) 
(♦Blank- With -Borrow BorrowAdd-Dccrcmcniinsicadof-Zcro) 
(Bo rrow -Add -Decrement- Insieadof-Zero) 

(Borrow-Add-Dccremeiti-Insteadof-'Zero *Quit-When-Sccond-Bottom-Blank) 
{Bo rrow -Add- Decrement- Insie ado f-Zcro *Skip-lnterior-Botiom- Blank) 
(Borrow-Wont-Recursc **Only-DoTUnits-Unless-T wo- Columns) 
(Borrow-Wont-Rccurse) 

(Borrow-Wont-Rccurse *Only-Do-Rrst&Last-Columns) 
(Borrow-Woni-Rccurse *Qtiit-When-Second- Bottom- Blank) 
(Borrow-Wont-Recursc *S kip-Inter ior-Bottom-Blank) ' 
(Borrow-Wont-Rccurse Smallcr-From- Larger- With- Borrow) 
{Bono w- Wo nt-Rccu rse-Tw ie3) 
(DocsnVBorrow-Unlcss-Bottom-Smaller) 
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(I>)csni-Borrow) 

(IDocsiit-IJoiTOw *Only-Do-UiiiLs&Tcns) 
(l3<)Csiii-IJorrow Blatik-liistcad-OMJorrow-1-rom-Zcro) 
(Docsiit-MtHTOw *Only-Do-FirsE& Last-Columns) / 
(Incrcmcnt-/xn)'Ovcr-IJlank) 
(*Only-l)o-11rsi&Ust-CoIumns) 

(*Oii1y-l)t>-i-irsi&I.asl-Columns Bhmk-1hstcadof-Iiom>w-Hxccpt-l,asi) 
(*On1y-l)c)-UnHs-Unlcs^'Iwo'Columns) 
(*0nly-J3c)-Uiiits-Unlcss- Two-Columns Copy op- locccpl- Units) 
(*Otiii-Whcn-Sccond-litHlom-IJlank libnk-lnsie<id-Of-liorrow-From-Zcro) 
(*Oliii-Wlicn-Sccond-IJoUom-Htank) 

(*Skip-lntcrior-lioUom- Blank Btank-Instcadof^Borrow-Kxccpt-L^st) 

(*Skip-lntcrior-lioUt)m-IJtank iilank^lnslcad'Of-llorrow-Hrom-Zcro) 

(SmallcrFrom-Largcr-lnsicadof- Borrow- From- Doublc-Zcro) 

(Smaller- From -1-argcr- With- Borrow Blaiik-lnstcad-Of-Bomow-From-Zcro) 

(Smallci-From-Urgcr-Wiih-Borrow Borrow -Add- tDccrcmcnt-lnstcadof-/cro) 

(Stops-Horrow-At-Sccond-Zcro) 

(Top-Aftcr-lJorrow Borrow-Wont-Rccursc) 

{ lop-Aftcr-Borrow Horrow'Add-Dccrcmciu-Instcadof-Zcro) 

(lop-lnsicadof-Borrow-F^ccpi-Kast) , 

flop Instcadof^Jorrow-From-DoublC'Zcro) 

(lop-lnsicadof-Borrow-Unlcss-Botiom-S mailer) 



Predicted bugsetSc ovc^fapperf hy observed bug sets 

((Borrow-Across- Zero) 

!Touched-0-N = 0 !Touchcd-Double-2cro-ls-Quit N-N-Causcs- Borrow) 
((BorrOw-Across-Zcro) 

!Toudied-0-N= Blank *Quk-When-Sccond-Bottom- Blank) 
((Borrow- Across-Zero) 

!Touched-0-N = Blank Houched-Double-Zcro^Is-Quii N-N-Causes-Borrow) 
((11 buched-O-N = N Borrow-Across-Zcro) 

*Quit-When-Secoiid-Boiiom- Blank !Touched-Double-Zcro-Is-QuU) 
{(I Touched-O-N-N Borrow-Across-Zero) 

!Touched*Double-Zen)-Is-Quit N-N-Causes- Borrow) 
((Borrow-Across-Zoro) 

!Touclied-0-Is-QLit Borrow-Wont-Rccursc -Twice) 
((Borrow-Across* Zero) 

!Touchcd-0-Is-Quit *Quit-W hen -Second- Bottom- Blank) 
((Borrow- Across- Zero) 

!Touchcd-0-Is-Quit N-N-Causcs- Borrow) 
((Borrow-Across- Zero) 

!Touched-0*IsrQuit *Skip-InteriorBottom' Blank) 
((Borrow-Across-Zero) 

!Touchcd-0-Is-Ten Borrow*Wont-Recurse-Twiee) 
((Borrow-Across-Zcro) 

!Touched-0-Is-Ten *Qult-W hen- Second- Bottom-Blank !Touched-Double*Zero-Is-Quit) 
((Borrow-Across*2iro) 

!Touched*0-ls-Ten !Touched-Double-Zcro-Is-Quit N-N-Causcs- Borrow) 
((N-N-Causes- Borrow) 

Blank-Instcad-Of-Borrow-From-Zero) 
((Blank-Insleadof- Borrow) 

*Only-Do-Units&Tens *Only-Do*Units-UnIcss-Two-Columns) 
((Quii-When-Bottom- Blank) 

Blank -Insteadof-Borrow *Only-Do-Firsi&Last-CoIumns) 
((Dlank-lnstcadof-Borrow) 

*Only-Do-Fir:it&Last-C<)Iunvis) 
(frop-lnsiead-Of-Borrow-From-Zcro) 
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((SnK(ilcrH-VomH.argcr- Instead -Of- Borrow- From -Zero) 

^liliiik-Wiih-liorrow) 
{( lion ow-Across- Zero) 

IJor owWont-Rcciirsc-Twicc) 
(dior ow- A cross- Zero) 

l l o idled O-lyQiiii) 
(dior ow-Acros:>' Zero) 

rU)icacd-l)oublc-/.cro-ls-Quii) 
{(lier ow-Acros!S-/cro) 

ITo jchcd-0-ls-"icn !Touchcd-I)oiihlc-Zcro-Is-Quii) 
((liorrow-Across-/cro) 

rrojicliccJ-O N=lilank. n ouch'id-Doublc-ilcro-Is-Quii) 
((Borrow-Across- Zero) 

ri o ichcd C N = 0 IToticlicd-I)oiiblc-/cro-ls-Quit) 
{(lior ow- A cross-Zero ITouclicd-O-N - N) 

! To ichcd-l )oiib]c-Zcro-Is Quil) 
{( Bor ow- A cross- Zero) 

ii-Whcn-Sccond-Rotiom-lilank ITouchcd-Doublc-Zcro-Is-Quii) 
(dior ow-Across-Zcro) 

i \ oiichcd-Doublc-Zcro-ls-Quit N*N-Causcs- Borrow) 
{(N-fjl-CflUScs-liorrow) 

Boifrow-Add-Uccrcmcni-lnsicadof-Zcro) 
(diorfOw-l-Vom-Onc-Is-Ninc Borrow- From- Zero) 

((Honrow-From-Onc-Is-Tcn Borrow^From-Zero-Is-Tcn) 

) [ 

{( Borrow-No- Dec rcmr^nt-fixccpt-I^st) 

*SBip-lntcrior-Boitom-Blank) 
{(BoiTow-No-Dccrcmcnl-Excq>t-I^l) 

*QDi I- When-Second- lioUom- RIa nK) 
({Borrow-Trcat-Onc- As-Zero) 
) I 

{(N -N -Causcs-Bo rrow) 

Borrow-Woni-Rccursc) 
(( Doesn't- iJorrow-Kxccpi-Last) 

*On1y-Do-Uniis-UnIcss-Two*CoIumns) 
{(Doesn't- tiorrow-FJtccpi-Lasi) 

*Only-Do-Umls&Tcns I)oesn**-Borrow-Unless- Bottom -Smaller) 
({Doesn>liorrow-Excepi-Last) ^/ 

*Only-Uo-First&Usi-CoIumns) 
({Doesn't- Borrow- Exccpt-Lasi) 

*OnIy-Do-First&Lasi-Co1umns Copy-Top-Fjcccpi-Uniis) 
({DocsnVBorrow'Excepi-Lasi) 

({Quit-When -Bottom- Blank) 

Doesnt-Borrow *Only-Do-Firsi';I^st-CoIumns) ^ 
((SmalIer-From-Larger-Instead*Of-Borrow'From-Zero) 

Doesni-Borrow) 
((BorTow-Don*t-Dccrcmeni-Un less- Bottom-Smaller) 

Forget- Rorrow-C.-^r-RIanks) 
{(Bo rrow- No-Dccrcmeni' Except- Last) 

*OnIy-Do-First&Lasi-CoIumns) 
{(Doesn't-Rorrow -Except- Last) ^ 

♦Only-Do-First&Last-CoIumns) 
{(Smaller-From- larger Quit- When-Bottom- Blank) 

*Only-Do-Fir^Last-CoIumns) 
{(Smaller-From-Larger) 

*OnIy'Do-First&Last-CoIumns) 
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{(Smaller-Frofn-I^rgcr^Kxcept-l^st) 

*On1y-1)o-l-irst&Last-Columns) 
((Mways^Jorrow-ljeft) 

*Only-l)o-lMrst&Ust-Columns) 
({liiank Iiistcjdof-lJorrow) 

Only-Do-Units) 
((Blimk-lnstcadol-liorrow) 

•Only-Do-Uniis&rcns) 
((Smaller- 1* rom-lxirgcr-foccpH^t) 

*On1y-Do-Units&'Icns *(Jn1y-Do-UnltsrUn1css-'Two-Co1unins) 
((Smallcr-From-Ui^cr-Instcad-Of-IJorrow-lYom-Zcro) 

*QuiE-Whcn*Sccond-1ioUom-lilank) 
((Si()ps-1)orrow-At-/cro) 

*Quk-Whcn*Sccond'1)oUom-B1ank) 
((DoesnVBorrow-Hxccpt-Last) 

*SklD-lntcrior-I)ouoni-B1ank) 
((Smai1cr-Froni*I,argcr-Kxccpt-Last) 

•Skip-1nterior*l)ouoni-B1ank) 
((Smallcr-l-nm^Urgcr-lnstcad-Of-Borrow-From-Zero) 

*Skip-Intcnor-Bouoni-B1arik) 
((Stops-Borrow-At-Zcro) 

*Skip-lntcrior*l)ouoni-B1ank) 
((Smaller- From ^Larger Quit- When- Bottom-Blank) 

*On1y-Do-Units-Un1ess-Two-Co1umns) 
((Smaller- From- Urger) 

•Only-Do-Uniis&lcns O-N^N-Except-After-Borrow) 
((Smaller- From -Larger-Except*Last) 
) 

((Sma11er-From*Larger-Instead-Of*Borrow-From-Zero) 
) 

((Smaller- From *Larger-Instead-Of-Borrow- From- Zero) 

N-N-Caiises-Borrow) 
((Sma11er-From-Urger*1nstead-Of-Borrow-From-Zero) 
' Smal1er-From*Largcr- With- Borrow) 
(Crop-lnstead-Of-BorrowFrom-Zero) 

Smaller-From- Larger- With-Borrow) 
((Stops-Borrow -At-Zero) 

N -N -Cau ses- Borrow) 
((Stops-Borrow-At-Zcro) 

Top-After*Borrow) 
((Top-Instead-OF-Borrow-From-Zero) 
) 

((N-N'Causcs- Borrow) 
TopHnstead-Of-Borrow-From-Zero) 
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Appendix? 

Version Spaces as Applicability Conditions — ' 

Chapter 19 discussed what the learners bias is for test patterns. It was shown that a 
topulugical bias — preferring maxinutllj general test patterns — \^as better than a bias based on 
teleulugical rationa1i/.<*tions. Hav^ever, there is a problem with this choice. 1 liis appendix discusses 
iiuii problem, and proposes a revision to the theory that fixes ft. However, this revision has not 
(yet) been tried out on Sierra. Tl\c arguments should be taken with a grain of salt. 

The topological bias depends crucially on negative instances 

*nie topological bias takes the most general test patterns that are consistent with the examples, 
ff there are no negative instances, the most general test pattern is the tri\ia11y general patfern, which 
^ matches all possible problem stdtes. ^Ilius. if a lesson has no descrimination examples (i.e.> examples 
with negative instances of the subprocedure's test pattern), the ne\v subproccdure will have a test 
pattern that is always true. Such a lesson generates the bug 

AlwayS'Borrow: 6^7 6 6^0 

,' - 1 2 -280 

I 315 X 2 710 X 

This bug borrows in every column* regardless of the relative values of the eoiurnn's digits. {N.B.> It 
tries to borrow in the leftmost column, does some local problem solving, and winds up doing an 
ordinary column difference in the leftmost column.) This bug is predicted for lessons that teach 
borrow without giving descrimination examples. For borro^ving, one descrimination example is a\ 

a, 6 7 b, 6 7 c. 6 3 

- 2 1 - 2 1 z±r 

3 6 

At the beginning of the problem's solution (state b). when the goal SUSICOL is called to process 
the units column* it has a choice between borrowing and taking the usual column difference. 
Because tlie teacher docs an ordinary column difference* the learner can infer that the teacher's test 
pattern for BORROW was false, (This inference involves the conflict resolution cunvcutions of the 
interpreter — see s^ection 10.5.) llie inferred falsehood of the teacher's test pattern means that 
current problem state, k is a negative instance. The learner compares it to generalizations of 
positive instances, such as the proljlem state where the test pattc^rn was true. The only difference 
between the negative instance and the generalized positive instance is' tliat T<n is false in the 
nogative instance's units column. Hence, T<B is a most general gencfalieation: ff the lest pattern 
were only T<B, then it would be consistent with all the instances, including the negative instance* 

^ What if there are no discrimination excmxples in a lesson? 

Some textbooks do not have discrimination exampj^s in their lessons. Houghton Mifflin s 
1981 textbook series, for example, does not use discrimination examples in teaching subtraction* 
The absence of discrimination examples \\ould seem tu predict^at all Houghton Mifllin students 
acquire the bug Always*Borrow. But surely some students learn the correct subtraction procedure. 
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't1ic apparcnE contr^tdiction ^icms rroin the tacit u£;£;umptton that a negative instance has to be in a 
subpPDccdurcs Ics^oii in order for it to be effective. This jssiiinption is apparency false. In fact 
die le^sl)ns folluwing the initial borrow lessons in the Huughtun MifHin text have -the requisite 
negative instances. In a borruwfromvero example* diprc is always a problem state which is a 
negative instance for burrowing (e.g., d below).' 

5 9 ' . 

d. 6'0^7 
- 2 3 8 
9 

- — _ 

lii order for Sierra's learner \o make use of negative instances, it must remember something abuut 
thcpusiti\e instances. Induction needs to know what was true abi»ut the positive instances so Uiat it 
can determine \\hich of these tnithfi is false uf the negative insunees. S^me induction algorithms 
store all tl^e pusitive instances (c.f., Dietterieh & Michalski, 1931). Thj version space algorithm (see 
section 13.1) saves the maximally specific generdi/AitJOns of the positive instances. In brder for the 
learner to make use of the negative borrow instances in, e.g., the burrt)w-fromv,ero lesson> it must 
store information abuut positive burruw instances from tlie burrow lessun. "ITiis information can not 
reside in the test paucrn. Since there ha\e been no negative instances, the test pattern is trivially 
generali i.e., {}. In order tu save the information, there must be ancillary information storage in the 
student procedure. 

Using version spices as cpplicability conditions 

The obvious solution is to have the procedure store the whole version space for a test pattern 
instead of just the test pattern itself, "fliis technique has been used before, for different reasons, by 
Mitchell eu al, (1931; 1933) in their calculus learner. UDi, LDC uses version spaces as test&> Given a 
vei^ion space <S, G>, the interpretation of it as a test is: 

A, A test is true i f for all 56S> s matches the current problem state. 

B. A test is false if there is no g^G such that g matches die current problem state. 

C If a lest is neither true nor false, \\s value is called indeterminate. 

This interpretation is logically correct, in a sense. Since the gGG are all maximally general 
generalizations* the target test pattern, whatever it is, is less general than them. Hence, if none of 
the g matchi then the test pattern couldn't match. So if no s matches, one can infer that the test 
pactern would be false. Similariy, if all the maximally specific generalizations match, then the 
unknown test pattern would also match since it is a generalization of at least one of them. If some 
g match and some s don't match, then one can infer nothing about the truth of the unknown target 
pattern. Depending on where the target pattern lies between the s and the g patterns^ it could be 
either true, or false. So a third truth value is needed. It is labelled indeterminate. 

A version space is represented by two sets of patterns, S and G. There is ample evidence that 
niles use only a single pattern of each kind. Basically, if this were not the case, then every learner 
would learn exactly the same applicability conditions since, by definition, only one version space 
results from a given 3ct of examples. All the ovcrgcncraluation bugs would be nilcd out if the 
whole version space ^?crc used as the applicability condition. In short, to account for the fact that 
different learners induce different applicability conditions, verbion spaces must be used in the 
following way: 
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Version space 

If<S, G> is the version space for an applicability condition, then for all iSG ivid for all 4>SG, 
Induce (Hitputsan adjoining mie witli the pair <5,5> as the applicability coiioition. The 
condition is tme if i m^itchcs exactly, false if i,* docsnH match cx<ictly* and indetcnninate 
otherwise. 

Iliib liypothesi& rcpKices two h>poihCM;s in the theor>: te^t pattern bia$ and test pattern maiching* 
Moreover, a third hyp<jthesis mubt be revised : lightly. "Ilie interpreter must have some conventions 
for handling the indeterminate truth value. "Ilie conflict resokitiijn hypothesis must be revised to 
bccoirie: ' ' 

Conflict resolution 

1. A rule may be executed only if it has not yet been executed lurthi^i instance of the current goal. 

2. In the ropiesenCation of procedures, tiie rules of and goals are linearly ordered. If the current 
goal isiin AN D goal and there are several unexecuted applicable rules, then tlic interpreter 
picks the first one in the goars order. 

3. If the currentgoal is an OR godK then 

A. If one rule's test is true, then the interpreter picks that mIe. 

B. If more than one rules test is tme, the interpreter picks the rule that was acquired most 
recently. 

C. If no rules test is true, and exactly one rule's test is indeterminate (the otliers beiiig false), 
die interpreter picks the rule with the indeterminate test. 

D. Otherwise, a halt impasse oc^curs. ' 

Clause B is a common conflict resolution strategy in production systems (McDermott & Forgy, 
1978), Here it is used because it is necessary in order to allow the learner to infer negative 
instances for test pattern induction (see section 10.5). Clauses A and C are plain common sense: 
Take a tme rule if there is just one and don*t take false rules. 

Afore direct argutnents for using \fersion spaces as applicability conditions 

The above alignment for representing test patterns as version spaces was based on two rather 
dubious criteria. First, it assumed that the learning modefi chronology should mimic the 
chronology of human learners at the grain size of lessons. Second, it assumed that all information 
from a lesson is transmuted chronologically via the learner's procedure. Beth criteria go beyond the 
main empirical criterion, which involves predicting the observed *iugs. So the argument above is 
more an informal motivation than a theoretically impeccable demonstration. Nonetheless, iti 
conclusion makes some verifiable bug predictions. These provide a proper support for the position 
that applicability conditions use version spaces. 

The conflict resolution conventions cause impasses in exactly the righ; plapcs to generate 
certain bugs> such as the following one: 
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4 ■ ' ,1 

Smallcr^From-Urgcr- - a. 5^7 6 5 2^7 

^Vhcrt- Borrowed- Prom: . ' ^ -19 1 -18 9 

-\ * 3 8 5 ^ 4 7 8 X 



1lic Icdincr can do pn)b1cnis \s\ih ibobicd borrows, ,ts in a, buta\\hcn confronted ^iih a problem 
tiui requires ^idjaceiu bt)in}w.s. such as k the soKcr impasses instead of doing the second borrow. 
1liis bug (icn\cs from Uie Ujcal problem stilver taking the I'orce repair (^hich ^ill be explained in a 
moment), "lliis repair ultimately-rcsult^ in the soKer -answering the tens column of b by Inverting it 
(i,e.* tlic ansv/er is 8- I), I'he important issue is the cause of the impasse. U seems that this bug 
occurs tthen t1)e learner hjis not yet taken the lesson on acUacent borrows. That is, the learner 
assimilated examples $uch 3S d and e below, but not ones like f. 

4 ^ 4* . 4 2 ^4 

c] Sh' d. , 5^^ 2 2 5^7 3,5^7 

- 1 9 - 1 ft 1 \ -119 "16 9 

, * ^ . 18 1' 1 3 8 1 8 8\ 

The presence of ne^Uive instances Jn d and e means that the version spaee for borrow's test will 
ha\e non-tri\ial maximally general generaliiations. In fact, the borrow rules version space ^ill be 
roughly: ' ^ ^ 

G: {{(LessThan? T B)}} - ^ 

^ S: {{•••(UssThan? T B)(DIGIT T ) . . . } . . .} 

There is only one maximally general generalization. T<B, Cmcially* some of the ma;timally spcciRe 
gencraliications in S expect the lop member of the column to be a DIGIT* not a crossed-out digit 
(tthich is categon/,ed as an XNUM by the gram'^mar), lliey expect this because in alt the positive* 
instances (e^* in c rfand e), the top eell of the column isn't erosscd ouL The first time, the inducer 
can encounter a crossed-out top digit in a column that requires borrowing iajfi problems like /that 
require adjacent borrows, llie learner hasn*t encountered such problems yet. At least one j€S (if 
not all) expect (DIGIT T), In problem b, the top digit of the tens column is crossed-out when 
bonow*s version space is tested, Henee* at least one sCS will not match. This means that the test 
will be indeterminate* By similar reasoningt it can be.shown that the test is indeterminate in the 
ordinary eolutnn difference rule as v/elL Thus, the volumn processing OR has two indeterminate 
rules. This triggers an impasse. 

In effect, the solver eanl decide what to do because neither rule is clearly tme or clearly false. 
The crossed-out T<B column doesn*t look exactly like erossed*out T>B column or a non-crassed- 
out T<B column. Since the interpreter ean*t decide, it has the Force repair make the deeisioii, 

Tlie Force repair is very simple: It chooses a mle that hasn\ been executed and pushes the 
rulers action onto the stxk- This causes the rule's action to be executed when interpretation 
resumes. In this case* the rules that Force must choose between ha\e indeterminate applicability 
conditions. Force chooses the ordinary column processing rule. This ultimately generates the bug 
shown above, Several other bugs have similar derivations, with different choices fcr the repair. 
The existence of these bugs supports the use of version spaces as applicability conditions. 
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Appendix 8 
- The Interstate Reference Problem 

llicrc seem io be good rctsons for representing focus of atlenlion explicitly (see section 1 1 J). 
Yet It would be nice, in one sense, if ihib werC not li^e case. ReprcscntiJig internally something that 
has external meaning leads into a real rats' nest of prol)Iems involving reference acmss stale changes. 
"Iliis problem will be discussed in the context of mathematical skills, of course, but the problem is 
more general. I have no good solution for it, but jt is important enougii that it dej;erves discussion. 
To put it concretely, the choice coneems what the representation shouW be for instances of 
notational objects. ITiat is, what is it that fetch patterns retrieve? What is it that flows through the 
prtKedure's data flow pathways? How does one represent focus of attention wfhen what is being 
attended to is a part of the problem state image? Most inrfportanlly, what happens to these internally 
stored objects when the external things that they refer to ehange? 

J 

Slates and state changes 

Foci of attention refer into changeable problem states. Before asking what a focus of 
attention is, we need to ask what problem states and problem state changes are. fn Newell and 
Simon's approach (1972), a theorist^supplied problem space specifies the representation for problem 
states and state change operators. Problem spaces can be different fbr different tasks and even for 
different subjects performing the same task. ITierc is a problem with this approaeh that is best 
described by an example (for other problems, see section 13.1), The I.EX (Mitchell ^:t a1., 1981; 
1983) and ALEX (Neves, 198J) learners use a problem space representation of integral and algebraic 
equations. They represent a problem state as a tree. A tree representing the problem state 

2x = 9-5 



A 

J chin 



is shownoin figure A8-la. The state change operators in these two learners are tree pruning 
Operations. When the subject adds a new line to the problem state above^ yielding 

2x = 9-5 
2x ^ 4 

the algebraic transformation underlying the subject's writing actions is reprp^nled by replacing a 
certain node in the problem state tree. Figure A8-lb shows the tree tliat represents the new 
problem state. The expression node of a has been replaced by a constant node, 4. All the state 
change operators in tliese two ^Iearners are similar. They modify a tree that represents the last 
equation on the worksheet 

What a human subject actually docs, of course, is write successive lines, copying the 
unchanged portion of the equation from one line to the next. Yet, for LEX and ALHX^ such state 
changes are not copying, but mutation. Essentially, their representation embeds, in a tacit way, the 
students* belief that the current focus of attention is the lowest equation in a vertical list. That is, a 
piece of student knowledge is repressed outside the grammar and the procedure, ft is hidden in 
the data structure used to rcpresent^^ problem space (i.e.j a tree) and the primitive operators used 
to represent writing actions. Some way of keeping student knowledge out of problem states and 
state change operators is needed, ITiis is easily done- 
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Term = Expr Term 4 

2 X 9-5 2 X 



Figure A8-I 

Trees representing problem states berorc and after an algebraic transrormation'. 



^tnce the tasks covered by this theory are written symbcl manipulation problems or various 
kinds, the grain size can be sei at tlie level or individual symbols (i.e., letters, digits, arithmetic signs, 
brackets, horizonu:! bars, etc). A problem state is represented as a set' of symboMocation pairs, 
where a symboVs location is represented, sa>, by the Cartesian coordinates of the symbol's lower left 
corner and its upper right comer. The state change operations all represent writing specified 
symbols at specified locations. Writing a symbol is represented by adding an element to the set that 
represents the problem stata This grain size is small enough to be universal across subjects and 
tasks, and yet large enough to avoid character recognition and other theoretically irrelevant 
aetivities. The following hypothesis formalizes the particular approach adopted by the theory; 

Universal state change operators 

ITie set of state change operators (primitive goals) is universal across tasks and subjects. 
In particular, the state change operators arc 
(Write X s) Writes symbol s in location X. 

(CrossOut x) Draws a scratch mark over the object x. 

(Bar x) Draws a horizontal or vertical line between the boundaries of 

the parts of the aggregate object x/ 

There is a reason why there are three primitive operators instead of just Write. As discussed in 
section 15.2, the syntax of scratch marks and bars is somewhat different than other symbols. In 
particular, scratch marks are the onl> mathematical symbols that are placed on top of other symbols. 
All the other symbols do not overlap each other. Bars also have unique properties. Vertical and 
horizontal bars are often shared, in a certain sense, by several aggregate objects. For instance, the 
bar in subtraction problems is shared by all the columns in the problems. One eonjcclure, due to 
Jim Greeno (personal communication), is that bars are not symbols per se, but instead are used to 
mark the boundaries between aggregate objects* 

One of the entailments of this small grain size is that procedures must express the students* 
actions in more detail. Instead of calling a larige*grained state change operator, the procedure now 
calls a subprocedure that performs several writing actions. For instance, to represent decrementing 
a number in subtraction, the procedure migiit call a subprocedure that (1) crosses out a digit, and 
(2) writes the new digit vertically above the old one. In algebra equation solving, instead of calling 
a tree pruning operation that represents the state"" change from a b, 
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a. 2x = 9-5 b. Zx; = 9-5 

2» =4 

the prtjccdurc calls (1) a subpitKcdurc that copies (he left si^ic of the equation to t)>c next fine 
dt)\\n, (2) <t writing action that writes the equals sign on the new line, iind (3) a writing action that 
writes Ihc 4 i)n t))e new equation's right side, W'ien the grain si/e is fixed at the symbol level the 
procedures must pxpress knowledge about writing notation in more detail 

The frame problem 

So far all that's been done is to plug a leak in the theory's explanatory adequacy by removing 
a parameter fr^)m the theorist's control and making it a fixed part of the model instead. However, 
there are entailments of this move, ITiej involve the notorious frame problem of AL To illustrate 
htjw the frame problem arises in this domain, we'll retorn to the algebra equation solve as it makes 
the state change from state a to state b\ 

a. 2x - 9-5 2x * 9-5 

2x = 4 

Suppose that the solver pushes an instance of a goal solv^equation, at state a, then it goes on to 
find out how to solve the equation. Suppose furthpr that goal has the equation of state ^3 as its 
argument. At the time the solver coines to finally make the state change that yields state b, the goal 
instance and its argument arc deep in tlie stack, ITie goal's argument is some stored focus of 
attention. What docs It designate now? More imporuntly, what should it designate when the stack 
pops back to the goal instance^ and the solver ch<;cks to see if the goal has been satisfied, la, 

whether the equation has been solved? There are several possible choices: 

I 

L Thesymbols*7x=:9— 5"instatea, 
1 Thesymbols*7x = 9-5** instated, 
t 3. Thesymbois*7x"4'* instate fc. 

The first choice has data flow objects (foci of attention) referring to symbols in past problem states. 
For computer problem solvers, this is quite possible to implpment, but it is wild as a conjecture 
about what people do! Choice 2 reflects the "fact" that objdcts such as wriuen symbols somehow 
mangage to persist over time without changing their| identity, possibij because the symbols have not 
moved, (Hut what if the student jiggles the paper? There are deep problems here.) A focus that was 
forged in state o refers in state b. However, algebra procedures are most simply expressed when 
choice 3 is used. This choice is exactly what inx [and aLVX use,, When control pops back to the 
saved goal instance, its arigument now refers to the latest version of the equation, ITiis makes it easy 
to check whether the solve-equation goal is now satisfied. However, as just shown, this convention 
embeds knowledge (i,e,, that the nile that the lowest equation in a vertical list is the one that counts 
as the current version of the equation) which ought to be expressed explicitly in the student's 
knowledge state. So choice 3 is not really tenabli, and choice 1 is crazy. That leaves choice 2, 
which is (roughly) the one that Sierra uses So th<^^ essence of the frame problem iu this domain is 
representing interstate reference: What should happen to parts of the internal state that refer 
(somehow) to the external state, when, that external state change^ out from under them? The rest of 
the appendix discusses two different ways to aihieve interstate reference. 
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Intensions extensions ? 

MI noLitional objects sliarc the pharactcristic ihat when ihcy arc instantiated in the problem 
state, they fill a cerLiin area in the proWem state. Hence, one way to represent Akus of attention is 
as the actual region of the problem sfate, a rccLimgle in Cartesian ctX)rdinates. say, that the object 
fills. ("ITiis is a pdrticularly appeal ing'^lution Tor mathematical forms because almost all of them 
have the same shape: rectangular. lr)prms Tor faces and stick figures aren't su computationally 
tractable.) Usii^g a representation that^stands for actual space on the problem state amounts to an 
exfensivnal treatment of Tocus. When a description is matched during the execution of the 
proccdura its extension (referent) is found; from then on, the procedure passes the extension 
around. 

The obvious alternative is to represent focus intensionally. Like a Montague grammars 
interpretation of a natural language sentence, executing a description yields an intension, which is 
henceforth passed through the data flow paths. Only when a referent is really needed is the 
intension extended, fn the case of .mathematical procedures* there are two places where referents are 
needed: (1) "Hie primitive reading operation^ Head, needs an extension so that it can read a location 
in the problem state and return the symbol, if any, ihat occupies the location, (2) Primitive writing 
operations, e.g.. Write, need an extended form so that they can actually write in the location 
specified by the form. Reading and writing are, of course, quite common in mathematical 
procedures. Quite likely, every description's intension is eventually extended either as a part of a'_ 
larger intension or alone, (in this respect, mathematical procedures are probably quilfe HifTerep* 
from natural language.) Hence, another way to view the intcnsional representation of focus is as a 
lazy evaluation scheme (Henderson & Morris, 1976). "Matching" a description (pattern) Just wraps it 
around the values of the goal arguments that are used in the pattern (i.e., the input variables of the 
pattern). An intension looks just like a very deeply nested description with no arguments; only 
when the intension is actually used is this description extended. In short, the choice between 
extensions and intensions comes down to when descriptions arc extended: at execution time or at 
read/write time* 

As the remainder of this appendix will show, there are arguments against both sides. Roughly 
speaking, if descriptions are extended at execution time, the problem state can change before the 
extension is used £0 it may no longer be an accurate reflection of the meaning of the description. 
Of! the other hand, if descriptions are not fully extended until their time of use, ihe problem state 
may have changed in such a way that the intension mis-extends. In both cases, changes In ihe 
problem state between execution and uso are fouling up the description-referent connection. With 
ySome efibrt, it can be shown that this difficulty has empirical ramifications for the theory. Hencc^ it 
-i^tt^t a moot point, one that the theory can ignore. 

The extensional approach 

In chapter 11, it was shown that focus can be saved on the goal stack as the argument of a 
gcali \f focus is represented extensionally, changes to the problem state can make some stored foci 
become out of date. Take addition problems, for example. Suppose a problem that starts with two 
columns, as a does, 
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a. 6 1 b. 3 5 1 c. 3 5 1^ 

4- 9 6 -f 9 6 9 6 

14 7^ 47 ^ , ■ 

gmwb to ihrcc columns, as in i. iTiis gmwth wUI make Uic rccwnglc circumscribing ihc column 
"sequence loobiridirat the time termination is tested* Taken litera11>, this would foi'ce the procedure 
ro bdteve the lens column ^tis the U^i (me. causing U to exit loo soon, giving tlie answer shown rn 
c Moreover, it would be impossible to represent a procedure th^jt gives the correct answer except 
by always including one niore column in the original extension of the column -sequence form than 
the columns that arc actually visible. "Hiis sort o/ look-ahead accommodation for an exception 
situation is difficult, if not impossible, to learn within the local learning firamework erected by the 
tlieory. In short, the theory wouldn't be able to learn auy correct procedure for addition. iTiis fal&( ^^ > 
prediction is a rather damning entailment of adopting the extensional approach. 



The haensional approach ^ 

7Tie intensional- focus hypothcSis^has sirftlfa? problems with stored foci going awry. Changes in 
l]ic state of the0^oblem stare can cause the intension for a description that would extend correctly at 
matching timc^ to extend incorrcctl> at the time of its use. An example will Illustrate this problem. 
Suppose th^t a subtraction procedure describes the plaice where an answer should be written as the 
"rightmost column that I haven't written in yeU" For subtraction without borrowing, this \% an 
adequate description* Indeed, it is an excellent one when the procedure is equipped to suppress 
leading zeros. In problems such as a, 

6 14 

a. 346172 b. 374 c. 374 

"14 6 17 1 -12 8 - 12 8 

t 6 

the procedure must traverse quite a few columns away from the tens column's zero before it can 
determine that It should actually write the zero instead of suppress it. In problem 6, the intension 
''rightmost column that I haven't written in" vtould refer to the units column at the time it's created, 
namely* at the beginning of the units coIumn*s processing. However, by the time boiTOwing is done 
and the intension is used to write the units column's answer, the intension now refers to the 
hundreds column since both the units and tens have been marked up. This would yield the star bug 
shown in c (given some fUrther assumptions, which I won't describe here, that cause it to quit early). 
Once again, the content of a stored focus has "gone bad" due to changes in the state of the problem 
state between creation and usi?. 

More star bugs arc generated when intensions are repaired. One example involves the 
following conjunction of events; the same intension is used in two places; both uses cause impasses; 
and different refocus repairs are applied at each impasse. Roughly speaking, non-star bugs in such 
doubte-use cases require both refocusings to go the same direction. But such inter-impasse 
communication would violate the repair-impasse independence principle, and cannot be permitted. 
Extensional-foci avoid the problem because the repair unit can only apply the refpcus/relaxatlon 
repair after it has popped the stack back to the description that originated the errant focus. Since 
intensional foci can be repaired in sUuh there is no reason to back the stack up, yet repair without 
backing up leads to star bugs. 

To sun^ up, there are problems for both the extensional focus and the intensional focus 
approaches^ -It seems that extending descriptions just once* either at execution time or at usage time, 
is insufficient. This urges considering a representation of ft>cus that extends descriptions twice, both 
at creation and use. This is what the current version of Sierra does. It is not a true solution since it 
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occasiondlI> (a\h Co wt»rk jn \^d>s that arc clcarli' not ways that a human s procedure would fall For 
oomplcccncss, the system will be described, but I make no claims for it's theoretical merit, 

Ushig parse nodes to solve the imersiafe reference problem 

"ITie ;>o1utinn depend;> iin the f<icc that the interface has a grammar separate from the patterns 
on the rules. Hiis design is argued for in ch<ipter 13. liccaase die gnirrimar makes it possible to 
give a global parse to the problem stace, focus can be represented by a node in the parse. A parse 
node has a box in Cartesian coordinates associated with it. so it is somewhat like an extension. 
How^\er, ic also has connections \ia the links in the parse tree to other nodes. !n particular, ic has 
Unks to Its constituents. Using parse nodes as foci takes care of most of tlie diRiculcies mentioned 
above. 7 

Since no trace of the description remains in the focus, when impasses that are caused by 
misplaced foci arc repaired via the refocus repair, che local problem solver must firsc back the stack 
up until ic gets to thi description. In this respect, parse node foci mimic excensional foci and avoid 
one pic thac intensional foci fall into. Parse nodes also escape the other way thac intensions lead to 
scar bugs. 'ITie details will be surpressed here. 

Tlie problem with che acen^^ional focus approach was that the size of an aggregate object 
could grow while a box was being held on the stack, making it coo small ac the time of its use. If 
using a n<irse node meanc simpl> using the box associated witit it, the parse node approach would be 
subjccc to the same dirficulcies with growing images thac the excensional fixri approach had. What if 
it didn't use the box, buc instead used the links from the node to the other parse nodes? In 
particular, suppose that "use" of a parse node involved matciJng ic againsc a current parse of the 
problem stace to find Its closest approximation. Tlius. if the problem state had not changed in any 
signifieanc wa>, an e^acc isomorph of the node would be found. If it had* there would be a good 
chance that macchinj; would find the ''intended** node, the one that its description would reference if 
the description were excended at usage time (roughly speaking). 

What this leads up to is an incerpretation process thac obeys the applicative hypothesis in a 
very literal way. it does not maintain a parse of the problem state in some global resource. Instead^ 
the problem state is continually being parsed for various reasons. Perhaps a little story would help 
make this seem plausible. Imagine that people arc equipped with a visual processor that is so 
powerful that glancing at an algebraic expression is sufficient to parse it. Whenever a description 
treeds to be matched, a glance causes a parse tree to spring up, the description is matched against it> 
and a pointer to the best fitting node is returned to begin its route through the data flow pathways 
of the procedure. The portion of the parse tree that is not being pointed at gradually fades away. 
(One should keep firmly in mind that this is fiction.) Later* when the parse node has to be used* it's 
obsolete; the box it mentions has coordinates relative to the person's head position that are now 
(jaite difFerent; connection to the visual worid has been lobt over time. So the reading or writing 
operation sends the parse node out to the visual processor, which glances at the problem state, gets a 
new parse tree, matches the node against the crect and returns the best ficting node with a bona fide 
current (up to the millisecond!) box. This box the read/write operator can and docs use. 

The point of this story is to give some plausibility to the notion that adherence to the 
applicative hypothesis should be maintained by frequent parsing despite the fact that any computer 
scientist would blanch at such an extreme position on the space/time tradeoff 
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Appendix 9 
The backup repair and the goal stack 



have been put together 
begins by showing thot 
state control regime wil 



It musi^be kept 
passing data m 



11iis appendix coijitains seveml arguments concerning the goal stack. Hie argutiients have 
been banished from their proper places in chapters 9 and H because ihey are rather long, lliey 
in thii appendix because they all use llie llackup repair. 11ie appendix 
:he IJacktip repair exiscs. ilien. section 2 uses Backup to sho^ that a finite 
not suffice. Control flow must be recursive. I he inlerpreter^nust use a 
goal stack. Section 3 uses Backup to shosv tliat data flow must be locally bound 
on the goal stack alonij witli the instantiations of goals. Global binding, i.e, 
Registers, will not allow Back tip to function in the way that it has been observed U) function. 

A9.1 The Backup repair exists 

Control and data pow are not easily deduced by observing sequences of writing actions. Too 
much internal computation can go on invisibly between observed actions for one tu draw strong 
inferences about control (Or data flow. What is needed is an event which can be assumed or proven, 
in some Sense, to be tiK result of an elementary, indivisible operation. 'ITie instances of this event 
in the data would shed light on the basic structures of control and data How. Such a tool is found 
in a particular repair called the Backup repair. It bears this name since the intuition behind it is the 
same as the one behind ;a famous strategy in problem solving: backing up to the last point where a 
decision was made in order to. try one of the other alternatives. This repair is so erucial in the 
remaining arguments that it is worth a few pages to defend its existence. 

The existence argument begins by demonstrating that a certain six bugs should all be 
generated from the same core procedure. There are two arguments for this lemma. From the 
fcmma, it is argued tfiat the Backup repair is the best way to generate the six bugs. 



A Cartesian product ijf bugs 

The bugs all lac^ an ability to borrow firom zero. (Hcneeforth, "BFZ" will be used to 
abbreviate "borrow frorji zero.") For easy refcrcncci the six bugs will be broken into two sets called 
big-BFZ and littlc-BFZj Big'BFZ bugs seem lo result from ^placing the whole column processing 
subprocedurc whenever the column requires borro}Ving from^ a zero ^"^^ ^'^^ 



Its bugs arc: 



SmalIc^*l•fom-La^6eT-Instcad•o^ 
Borrow^From-Zero: 



Btanblnstead^of- 
Borrow-Frorrt'^Zcro: 



3 4 6 
-10 2 

2 4 3-/ 


3 

3 4^6 
- 1 2 9 


2 

3^0 7 
- 1 6 9 

1 4 2 X 


3 4 & 
-10 2 


3 

3 4^6 
- 1 2 9 


2 

3^0 7 
- 1 6 9 


2 4 3-/ 


2 16-/ 


1 4 0 X 


3 4 & 
-10 2 

2 4 3 V 


3 

3 4^6 
- 1 2 9 

2 16V 


2 

3^0 7 
-16 9 

1 4 X 
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Crhc small numbers stand fi>r the student's scratch marks. Corrcct]> answered problems ar^: marked 
M^ith t/ and incorrectly answered problems arc marked with X.) When a column requires 
borrowing from zero, as the units column does in tho last problem* the first bug takes the absolute 
difference instead of bi>rru\\mg. ITie sccund bug answers the cotugin witli the maximum of z^ro 
and the difference, namely /,eri>. The third bug just leaves sucli BFa columns unanswered. Notice 
tliijt ail three bugs pcrfomi correctly >^hon tlie borro^^ does not require UVZ, as in the teris column 
of tlio last problem. 

The little-BFZ bugs have a smaller substitution targpt, speaking roughly. Only the operations 
that nonnalb implement borrowing aeross the zero are leplaced. namel> tlie operatloji^ jf ehanging 
the zero to nine and borrowing frojn the next digit to the left ITie three little-BFZ bugs are; 



3 2h 



Borrow^ A dd-Dccrcment- 


3 4 6 


3 4^5 




3 0^7 


Instcad-of-Zero: 


- 1 0 Z 


- 1 Z'9 




- 1 6 9~ 




2 43 7 


2 1 6 




1 5 8 X 


Zero-Bomjw-Ai-Zero ; 


3 4 6 
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3 4^5 




3 0^7 




- 1 0 Z 
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- 1 6 9 




2 4 3 V 


2 1 6 




1 4 8 X 


Stops*Born)w-At-Zero; 


3 4 6 


3 

3 4^5 




2 




- 1 0 Z 


- 1 Z 8 




- 1 6 9 




2 4 3 V 


2 1 6 




1 4 8 X 



In the first case, absolute diffcrenee has been substituted for borrow *s decrement- Hence^ the zero 
in the third problem is changed to the absolute difference of zero and one, namely one. The 
second bug, Zero-Borrow -At- Zero, is generated by substituting the max*of-zero*and-difference 
operation for borrow *s dccrcmenL This causes the bug to cross out the zero of the last problem, 
and write a zero over iL The third bug, Stops-Borrow- A t-Zcro> simply skips the borrow-from part 
of borrowing when it is a 2ero that is to be decremented (Zero- Borrow -A t-Zero generates exactly 
the same answer as Stops-Borrow-At-Zero. The scratch marks are the only way to tell them apart 
Since DEBUGGy is not given access to the scratch marks, it does nut distinguish between the two 
bugs. Both are called Stops-Borrow-At-^^ro.) The following table summarizes the bugs in a 
provocative way: 





Big-BFZ 


Little^BFZ 


Absolute difT 


* Smaller^From-Langer^Instead- 


Bo rrow-Ad d-Dccrejne nt- 




of-BofTOW"From-Zero 


Insicad-of-Zero 


Max ofzero> diff 


Zero* Instead -of- 


Zero*Borrow-At-Zero 




Borrow-Fuwn-Zero 




Noop 


Btank-lnstcad- 


$ tops-Borro w^At- Zero 




of-Bormw-From-Zero 





The Cartesian product relationship between these six bugs is exactly the kind of pattern that 
local problem solving captures. The most straightforward way to formalize it would be to postulate 
two core procedures^ one for big-BFZ and anotlier for little-BFZ However^ all six bugs miss the 
same kind of problems, namely just those problems that require burrowing from a zero. Intuitively, 
they Seem to have the same cause: The subskill of borrowing across zero is missing from the 
subject^s knowledge. This intuition ir supported by the fact that subtraction curricula generally 
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ciniiain lessen*) ili<ii teaches bi^rruwing from non-icro digits, followed by separate lessons that leach 
liK/. Tiic incremcntiil learning hypuiliesis (section 3.6) implies that all eore procedures associated 
with a eeriain scgrneiii gf the lesson sequence will miss just tlie pmblems Ui;it lie outside the set of 
exiimples ;jnd exercises of th<n segment. If two eore procedures were used, tlien both eore 
prtjeedurcs wuuld have to be paired with the instructional segment that teaeltcs borrowing from 
noir/ert) digits. This mccins tlut wh£ite\er the mechanism is th<it implements learning, it musi 
expLun how it could generdte /ho core procedures that differed ottl} by how much of the procedure 
ms repaired ITiis iipproach amounts tu keeping the local problem solver simple by makitig the 
learner mure complex. On the other hand, the learner can be kept simple by having it generate jusi 
une core procedure inste<!d of two, then making the local problem solver a tad more complex by 
having ii generate all six bugs from thai single core procedure. In shorts it seems thai either the 
fcarner or the suUer must be made complex, 'ITiis argument doesn^t say w*hieh one it should be. 
The next argument shows that it should be the solver that should be made complex. 

Bug migration evidence for deriving all six bugs from the same core procedure 

The next arg^ent is based on an assumption about bug migration: If a student migrates 
between two bugs, ii is assumed that the two bugs are derived from the same eore procedure via 
different repairs to the same impasses. This assumption is defended in chapter 6- Given it, all thai 
has to be done to show that the big-BFZ bugs come fnim ihe same core procedure as the little-BKZ 
bugs is to find a ease of a big-iiFZ bug migrating with a little-liFZ bug. Figure a9^1 exhibits such 
a migration. 

Figure A9-1 shows the first six problems of a subtraction test taken by subject 19 of 
classroom 20. This third grader gpts the first four problems right, which involve only simpler 
borrowing. He misses the next two, which require borrowing from 7jero. Crueially, these two 
problems are solved as if the subject had two different bugs from the Ortesian product pattern. 
"ITils is an instance of intra-test bug migration. ITie fifth problem is solved by a liUle-BFZ bug: 
The student hits the impasse (note the scratch mark through the zero), and repairs it by skipping 
the decrement, a repair that generates the bug Stops borrow- A t-Zcro. He finishes up the rest of the 
problem without borrowing — apparently he wants to "cut his losses*' on thai problem". (On the 
next problem, he again hits the dccremeni zero impasse, but repairs it tills time by backing up and 
takmg the absolute difference in the column that originated the borrow, the units column. This 
generates the bug Smaller-From-Larger-InstCdd'Of-Borrow-From-Z^ero. Since both btig^^ arc in the 
same bug migration class, both are somehow derived from the same eore procedure via repair. 
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Figure A9-1 

First six problems of a test that shows a bug migration, 
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Backup explains the Cartesian product paiiern 

Two arguments have forced ihc conclusion that all six bugs come from the same core 
procedure. Now the problem is tu find repairs that will generate all of them, given tlint they all 
stem from the same impasse. One way to do that would be to tise six sepantle repairs. However, 
that would not c^ipturc a fact about thc^e bugs that was highlighted in their description: they fait 
into a Cartesian product pattern whtfee dimensions arc the "size" of the paleh (i.e., just the 
decrement, versus the whole borrowing operation) and its "function** (i.e., taking tlie absolute 
difference* versus skipping an operation* \crsus takiitg the maximum of /.ero and differenee). It will 
be shown that this pattern can be captured by postulating a Backup repair, and hence that the 
Backup approach is more descMptively adequate. 

The Backup repair resets the execution state of the interpreter back to a previous decision 
point in such a way that when interpretation continues, it will choose a different alternative tlian the 
one that led to the impasse that Backup repaired, llie Backup repair is used for the big-BFZ bugs. 
Using Backup in those cases causes a secondary impasse. The secondary impasse is repaired with 
the same repairs that are used for the little-BFZ bugs. This is perhaps a little confusing, so it is 
worth a moment to step through a specific example. 

Figure A9-2 is an idealUed protocol of a subject who has the bug SmalLr-pi^m -Larger 
Instead-of-Borrow-From-Zero. The (idealized) subject docs not know about borrowing from zero. 
When he tackles the problem 305-167, he begins by comparing the two digits in the units column. 
Since 5 is less than 7, he makes a decision to borrow (episode a in the figure), a decision that he 
will later come back to. He begins to tackle the first of borrowing's two subgoals, namely 
borrowing- from (episode b). At this point, he gets stuck since the digit to be borrowed from is a 
zero and he knows that it is impossible to subtract a one from a zero. He's reached an impasse* 
The Backup repair gets past the decrement-zero impasse by "backing up.** in the problem solving 
sense, to the last decsion which has some alternatives open, llie backing up occurs in episode c 
where the subject says "So V\\ go back to doing the units column." In the units column he hits a 
second impasse, saying "I still cant take 7 from 5," which he repairs ("so Til take 5 from 7 
instead"). He finishes up the rest of the problem without difficulty. His behavior is that of 
SmallerFrom-Lai^ger-Instead-of-BorrowFrom Zero. The other big-BFZ bugs would be generated if 
he had used different repairs in episode" c (c.g., Zcro-Instcad-of-Borrow-From-Zcro would be 
generated if he reasoned, "I sail canH take 7 from 5, but if 1 could, I certainly wouldn't have 
anything left, so V\\ write 0 as the answer.*') 

Summary 

It has been shown that the Backup repair is the best of several alternatives that generate the 
six bugs* It playi an equally crucial role in the generation of many other bugs, but the argument 
stuck with just six bugs for the sake of simpllciiy. Backup is the tool that will be used to support 
Several hypotheses about core procedures* - " 
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a. 305 In the units column, I can't take 7 from 5, so I'll 
- 167 have to borrow. 



b. 305 To borrow, I first have to decrement tlie next 
-167 column's top digit. But I can't take 1 from 0! 



305 So I'll go back to doing the units column. I StillTcan't 
- 1 67 take? from 5, so I'll take 5 from 7 instead. 



2 

Si05 In the tens column", I can't take 6 from 0, so I'll have to borrow. 
-167 I decrement 3 to 2 and add 10 to 0. That's no problem! 



1 

Si05 Six from 10is4. That finishes the tens. Thehundredsis 

- 1 67 easy, there's no need to borrow, and 1 from 2 is 1, 

142 



Figure 

Pseudo-protocol of a student performing the bug 
Smaller-Froni'LaiTger-fnstead-of'Borrow From-Zero, 
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A9*2 Backup requires a goal stack 

The liackup repair sends control back to .sojne previous decision. The question is. which 
decision? There arc tliree well known backup regimes used in AI: 

1. Chwnologml liockup: The decision that is returned to is the *one made most recentJyi 
regardless of what part of the procedure made the decision. 

1. D^pendet}cy*dtr€ctc(^ Backup: A special data structure is used to record which actions depend 
on which other actions. When it is necessary to baek up. the dependencies are traced to find 
an action tliat doesn't depend on any other action (an "assumption" in the jargon of 
Dependency-directed backtracking). ITiat decision is the one returned to: * 

r. Hierarchical Backup: To support HierarthieaJ Backup, the procedure representation language 
must he hierarchical in that it supports the notion of goals with subgoals, and the interpreter 
must employ a goal siaek. In orxler to find a decision to retumja Backup searches the ggal 
stack starting from the current go^l, popping up firom goal to supergoal. ITic first (lowest) 
goal that can "try a different method" is the one returned to. Such a goal must have subgoals 
that function as alternative ways of . achieving the goal, and moreover, some of these 
alternative methods/subgoals must not have been tried by the current invocation of the goal. 
When Backup finds such a goal on the stack, it resets the interpreter's stack in such a way 
that when the interpreter resumes, it will eall one of the goal's untried subgoals. (In A.I., this 
is not usually thought of as a form of Backup. It is sometimes referred to by the Lisp 
primitives used to implement it, e.g., THROW in Maclisp, and RETFROH in interlisp.) 

The key difference among these backup regimes is> intuitively speaking, which decision points the 
interpreter "remembers." ITiese establish which decision points the Backup repair can return to. In 
Chronological and Dependencydirected baektrackinfc, the interpreter "remembers" all decision 
points. In Hierarchical backup, it forgets a decision point as soon as the corresponding goal is 
popped. The critical case to check is whether students ever back up to decisitft points whose 
corresponding goals would be popped if goal stacks were in use. If they don't return to such 
"popped" decision points, then Hierarchical Backup is the best model of their repair regime* On 
the other hand, if students do return to "popped" decisions, then either Chronological or 
Dcpendency^dirccted Backup is the better model. This section argues that students never back up 
up to popped decision points. By "never." I mean that returning to popped decision points 
generates star bugs. The evidence to be presented vindicates Hierarchical backup, and shows that 
(1) procedures' static structure has a goal-subgoal hierarchy, and (2) a goal stack is used by the 
interpreter in executing the procedure. In short, push down automata are belter models of control 
structure than finite state automata! 



Chronological Backup 



Chronolpgical Backup is able to generate the Backup bugs mentioned in the previous section. 
The walk-through of f(gurc A9*2 should be evidence enough of that However, by the impasse- 
repair independence principle. Chronological Backup can be used to repair any impasse. This 
causes problems. When Chronological Backup is applied to the impasses of certain indcpcndcntfy 
motivated core procedures, it generates star biigs. The motivation for the core procedure in 
question ''is found in bugs such as: \ 
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After cjmplciing the borrow from half of borrowingi these bugs fail to add ten to the top of the 
coluinn borrowed into (c.f.. the top of the units column in the second problem). Instead. Smaller- 
From-Karger-With-Borrow answers the column with the absolute difference, and Zero-After-Dorrow 
answers the ;:olumn with zero. Apparently, and quite reasonably, the core procedure is hitting an 
impasse when it returns to tlie original column after borrowing. Since the eolumn has not had ten 
added to the top digit as it should* the bottom digit is still laTger than the top. This eauses the 
impasse, which is seen being repaired two different ways in the two bugs*'^ 

With the eore procedure thus independently motivated, one should find^that a &Ug is 
generated when Chronologieal Backup is applied to the impasse. Instead, one finds a star bug. 
Consider a DFZ problem such as the last one above. ITie following is the first few probhm states 
♦ ^ as the star bug solves it; ^ 

2 .29 ^29" ^^29 

a. 3^0 7 b. 3^0 7 c. 3^0 7 d.^ 3^0 7 e/ 3^0 7 
"16 9 -16 9 -16 9 -16 9 -16 9 

First the core procedure borrows across zero (problem states a h and e)> then it reaches the 
impasse in the origt iating column just prior to problem state d. It is going to back up to the most 
recent decision. Its most recent decision was that the digit that it was to borrow from in the 
hundreds eolumn was non-zero and henee eould be .decremented, thereby finishing up the BFZ. 
Fhis decision occurred just prior to 6. Since it is the most recent decision chronologically > 
Chronological Backup causes control to go back to it and take its other alternative, which is to BFZ. 
The procedure starts the BFZ by adding ten to the hundreds eolumn, as in state d. It tries to 
decrement the next column left, the thousands, but no such eolumn appears. An impasse occurs. 
Suppose ttjj student repairs it ,with the Noop repair, causing the decrement to be. skipped. The 
BFZ continues, decrementing the lens (state e). When it gets done witli this superfluous borrowing, 
it comes right back to the original column> which still has a too small top digiU so the impasse 
occurs again. If ii is repaired with Chronological Backup again, control goes back to the hundreds 
again! Repeated uses of Qironologieal Backup results in an infinite loop, and a rather bizarre one 
at that Even if the.second occurrence is repaired some other way. going back to re-borrow in the 
first place is extremely odd behavior. Qearly, this is a star bug. and should not be pritdieted 
occur by the theory. Chronologieal Backup is ruled out Because u causes the theory to, 
overgen'erate. 

Dependency-directed Backup 

The basic idea of Dependency -directed Backup is to return to an action that doesn't depend 
on other actions. Dependency -directed Backup is really not a precise proposal for this domain until 
the meaning of "actions depending on other actions'* is^dcHned. Consideration of the star bug that 
was just described leads to a plausible definition. Part of what makes the star bug absurd is its 
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gQing back to change columns to the left of ihc one where the impasse occuired. U seems clear 
, that precondition violations in one column don't depend causally pn venting actions in a different 
column. Suppose dependency between actions is defined to mean that the actions operate pn the 
same eolumn, or more generally, hii\e the Stime locative aijumeiits. Dependency-directed Backup 
will not make the mistake that Chronologieat Backup did. It will never go to another column to fix 
an impasse that occurs in this column. 

However, even this rather vague defiiition limits Dependency -directed Backup tuo strongly* 
Several examples of Bac^^up were presented earlier (i*e., the big-Bh'Z bugs) where the location was 
shifltcd. For instance, in the problem state sequence of figure 8*2, SmallerFrom-I.arger-Instead-of- 
Borrow -From-Zero diifted to a decision in the units column from a impasse in the tens column. 
Dependency directed Backup can't generate such location -shifting Backups. Hence, it can't generate 
the big-BFZ bugs. As it is presently defined, it causes the theory to undcrgenerate. 

Hierarchical Backup • ■ 

fn essence, these two approaches to backing up show that neither time nor space suffice. That 
is, sueh natural eoneepts as chronology or location will not support the kind of backing up ihai 
Subjects apparently use. That leaves one to infer that Backup must be using some knowledge about 
the procedure itself. The issue that remains is what this extra knowledge is. A goal hierarchy is 
One sort of knowledge that will do the job. as the following demonsu'ation shows. 

The basic definition of Hierarchical Backup is that it can only resume decisions which are 
supei^oals of the impasse that it is repairing. With this stipulation, any one of a number of goal 
structures will suffice to Hock the sur bug that Chronological Backup generated as well as generate 
all the Backup bugs that have been presented so far. One such goal structure is shown in figure 
A9-3. This figure shows the goal*sub^al relationships with arrows, fn this goal structure, the 
borrow-from goal (the goal that tests whcihcr the digit to be borrowed fiom is zcroX Is not a 
supergoal of the DifT goal (the operation that reaches an impasse in the star bug's generation), 
ITiere would be a chain of arrows from borrow-from to LJiffif it were. Hence, when Backup orcun 
at the DifT impasse, it cannot go to the borrowfrom decision even though that decision is 
chronologically the most recent 

So far* it has only been shown that the Backup repair obeys a hierarchy, l^s not been shown 
that the goal hierarchy has anything to do with control of regular execution. But, if one simply 
postulated that the goal hierarchy existed off to the side, with no purpose other than to eonsu'ain 
Backup, then the theory is sorely damaged because there would be no way to test or refute this 
hypothesis. A more reasonable suggestion is that regular execution uses the hierarchical goal 
structure in sueh a way that Hierarchical Backup is a natural consequence of the structure of the 
intemaK runtime information. The usual goaf stack has this property. At. any time, the goals on the 
stack are all supei^oals of the currently executing goal. Postulating a gdal stack naturally explains 
why Backup returns lo supergoals — those are the only decision points that are explicitly saved in 
the runtime state. *fhe goal stack architecture explains why the Backup repair behaves the way it 
does. 
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Jrom-zero 



Decremerit 



" Figure AM' . ; 

A goal hierarchy that will allows Hicratx;hical Backup to hjnction correctly. 
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Summary • ' 

First, the -existence pf a single core procedure for a set of six bugs was demonstrated. To 
generate the. bugs, a Backup repair was postulated. Three' kinds of Baekup' were considered. 
Chronological Backup used time to determine which decision points to back yp to. Dependency 
din^c^ JJackup used spatial locations. Presumably, these two natural kinds of information time 
and space — would be accessible in a finite slate machine architecture. However, tHey both proved 
ineffective in capturing the facts about the Backup repair's behavior, lliis^owed that a new kind 
of information, a. goal hierarcny, had to become a part of thg procedures and a part of the 
execution state. That is, the finite state machine architecture had to be augmented with a goal 
Slack, In short, to explain the Backup repair, the underlying cootrol structure has to be that of a^ 
push dom automaton. 
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I Factl: SmaIIcrFrom-I-argcMiistcad-of-Borro*JrFrom-Zero 



A, "riic schema-instance arcliuecturc genemles tRe~t)Q^ 

B, Roisters 

L A single register generates a star bug 
2* "Smart" Backup is irrefijtable 

3. Multiple registers allows generation or the observed bug 
Faci2: Borrow-Across-Zero, left-ten 

A. The- schema-instance architecture generates the- bug 

B, Registers 

1. One register per goal: can*t generate the bug 

2. One register per object: can't generate the bug 
3: "Smart** Backup is irrefijtable ^ 

4. Duplicate borrow-flrom goals: entails infinite procedure 

5. Duplicate borrow-firom goals: equivalent ,to 'schema-instance 

V * 
^ Figure AM 

Outline of the argument between the register hypothesis and the schema-instance hypothesis 



A9*3 Focus is locally homi 



0 



The argumenLjn-^UlisjsectiQiLconcerm^h^ represent fbcus of-attentionir or data flow as it- 
was labelled in chapter 11. Two alternative architectures will be discussed: register-ba$ed and 
schema^instance. A register-based architecture is one that employs globally accessible memory 
resources, Jifce tiie registers in a microprocessor. TJie schema-based architecture binds Its memory 
resource into the contrpl How, just as a schema's instances hold extra information. This distinction 
will becdme clearer in a moment As it turns out, just two facts are needed. One is the bug 
described earlier, Smaller-From-Larger-Instead-of^Borrow-From-Zera The other feet will be 
introduced in a rnoment. The ai^urnent is organized around these two fects. Fj^^ A9^ is an 
outline of the arguHient . 

It will be shown that the schema-instance architecture generates the first bug (I.A in the 
outline). However, the simplest version of the register- based arehitecture, the use of a r^'igle focus 
register, generates a«star bug instead (I.B.l). Patching its difJicuUies by making the Backup repair 
more corT;)lex leads to problems with retaining the felsifiability of the theory (I.B.2)* However, 
using several registers instead of just one register allows the bug to be generated simply So 
the conclusion to be drawn from the first fact is that the register approach will be adequate only if 
there is more than one focus register, 

^^The second part of the argument introduces a new bug involving the Backup repair Once 
again/ the schema-instance architecture predicts the fects correctly (ILA). Two difTenent; 
implementations of multiple registers feil (IIBA and ILB*2) by generating star bugS- A smarter 
Backup ^would fif fte problem but remains methodologically undesirable (ILBJ). Postulating 
vario\JS complications to tlie goal structure of the procedure (n.B.4 .and ILB*5) allow the correct 
predictions to be generated, but thQy have problems of their own. So tlie second part concludes 
that the register- based alternative is inadequate even when various complications are introduced* In 
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overview, the argument is a nested argiiment^by-cases where all the cases except one are eliininated. 
To aid in following it; the cases will be labelled as they arc in the outline of tigure 

I. A Schcma^insmncc generateslhebug ^^^^i-^ _ 

ITie basic idea of a schema-instanee architecture is that the location of a goal is strongly 
asstjciatcd with the goal at the time it Ls first set 'ITiat Ls. when a goal sucli as borrowing is 
invoked, tt is invoked at a certain column, or more generally, at a certain physical location in the 
visual dispby uf the problem. In a schema- instance architecture, this association between goal and 
location, which is ft)rmed at invocation time, persists as long as the goal remains relevant. 'ITiat is, 
tlie goal is a schema, which is insfanhafed by substituting specific locations, numbers or other data 
into it. If the control structure is known to be a stack, one would say that the focus is locally 
bound Many computer languages, such as Lisp, have a schema-instance architecture: A ftjnetion is 
instantiated by binding its arguments when it is called, and its ai^uments retain their bindings as 
long as the fUnction is on tlie stack. However, in the interest of factoing the various parts of the 
theory independently, we will not assutne that control is recursive, 

ITie schema-instance architecture allows the bug of figure A9^1 to be generated quite 
naturally. Suppose the SublCol goal were strongly associated with its location, namely the units 
column, in some short term memory associated with SublCol's instantiatinn. Backup causes the 
resumption of the goal at the stored location. Another way to think of this is that the interpreter is 
maintaining a short term ""history list" that temporarily stores the various invocations of goats with 
their locations, fn regular execution* when the borrowing goal finishes, the SublCol goaf is 
resumed oi the same place as it started. ITiat is. in the long-term representation of the procedure, 
the SublCol goal is a schema with its location abstracted out* It , is bound to a location 
™(inslantiated)^hen-^it4S-invoked.,, It-is^the-^instantiated goal that Backup returns to, not-thc- 
schematic one. 

This schema-instance distinction, which is at the heart of almost all modern prcgramming 
languages, entails the existence of some kind of temporary memory to store the instantiations of 
goals, and thus motivates this way of implementing tiie Backup repair. But there arc of course 
other ways to account for focus shifting during Backups Several will be examined and shown to 
have fewer advantages than the schcma-instanee one. 

LB J A single register arch itecture generates a star bug 

Suppose that instead of using the schema instances to implement data How, the architecture 
uses a single register, a yo^j-arc^herc pointer to some place in, the current problem state. There 
would be no problem rcpre^fenting the subtraction procedure in such an architecture: Actions in the 
procedure's representation would change the contents of this register as the various goals are 
invoked. 

However, if the you-are-here register is simply left alone during backing up. then a star bug is 
generated. It is illustrated in figure At episode.(b). Backup resumes the SublCol goal, but 

the you^are-here register is not restored to the units eoiumn* Instead, the tens column is processed. 
The units column is left with no answer despite the fact that its top digit has had ten added to it 
In the judgment of expert diagnosticians, this behavior would never be observed among subtraction 
students, it is a star bug, The theory should not predict its occurrence. 
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In the units column, \ can't take 6 from 2, so I'll 



^106 have to borrow. First f IT a'dd tcnlotheZ 



Vm supposed to decrement the top zero, but I can*t! 
- 1 06 So I guess ril back up to processing the column. 



c. 4d2 Processing it is easy: 0-0 is 0, 

- 106 
0 



The hundreds is also easy, I'm done! 

-106 
30 



Figure A9-5 . 
Pseudo-protocol of a student pcrformirig a star bug 
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}Ji.2 A smart Backup repair makes the theory too faiiorable 

To a\oid the ^t^r bug. the Backup repair would have to employ an explicit action to restore 
tile register t» the units cohtmn \\\ episode c of the pro tocol of figure A9 \. Hmi how would it 
know to du this? HLtLkup would have to determine that the focus or jtteiuion ilunild be shjfto(f 
rightv\jrd b> doing an ijnalysis of tlie giwl structure contijined in the stored knowledge about tlie 
procedure. It would sec that in normal execution* a locative fxus shifting function was executed 
between the Burrow-from goiji jnd the SublCol goal. For some reason, it decides to execute the 
inverse of this shifl as it transfers control between the two goals. 

Not only dt)es this implementation make unmotivated assumptions, but it grants Backup the 
power fo do sianc amlyses of control structure. . Tliis would gi^e it significantly mt>re power than the 
other repairs, which do simple* local things like skipping the stuck operation. Postulating a smart 
Backup gives the local problem solver so much power that one could '"explain** virtually any 
behavior by cramming the explanation into the black boxes that arc repairs. Iliat is* it gives the 
theory too much lailorability. It is much better to make the repairs as simple ^ possible by 
embedding them in just the right architecture. 

LB J Multiple registers allow generation of the bug 

Another way to implement B;iekiip involves using a set of registers. The registers have some 
designated semantics* such as "most recently referenced column" or "most recently referenced 
digit." That is. the registers c»uld be associated with the type or visual shape of the locations 
referenced, as Smalltalk's class variables are* Alternatively* they could be associated with the 
schematic goals. Old programming languages used to implement a subroutines variables this way 
by allocating their storage in the compiled code* generally right before the subroutine's entry point 
SublCol ^ould have a register* Borrow vrt)uld have a different register, and so on. 

Given this architecture* Backup is quite simple* Returning to SublCol requires no locative 
focus shifting on its part. Since the SublCol register (or the column register* if that's the semantics) 
was not changed by the call to Borrow* it is still pointing at the units column when Backup causes 
control to return to SublCol. This multi- register implementation \s competitive with the schema* 
instance one as far as its explanatory power. Backup is simple and local. Moieover, the data flow 
architecture has motivation independent of the Backup repair in that is used during normal 
interpretation. However the multi- register approach fhils to account for certain empirical fhcts that 
will now be exposed. 

/L A ' Another bug. and schema-instat\ce can generate it 

The argument in this ease, case II* is similar to that of case I* U takes advantage of 
subtraction's recursive borrowing to exhibit Backup occurring in a context where there arc two 
instantiations of the Borrow goal active at the same time* There are two potential destinations for 
Backup. It will be shown that the schemaMnsiance mechanism is necessary to make empirically 
correct predictions. 
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A common bug is one that forgets lo change the ?jDro when borrowing across zero. Iliis leads 
to answers like: 



'file 4 was decremented Once due to the borrow originating in the uni''^ column, and then again due 
10 a borrow originating rrotn the tens column because the tens coiiir-n was not changed during the 
first borrow, as it should have been, lliis bug is called Borrow-Across-Zcro. It is a common bug* 
Of 417 students with bugs, 51 had this bug. 

An important fact is seen in figure A9-6. The bug decrements the one to zero during the 
first borrow. *llius, when it comes to borrow a second time, it finds a zero where the one was* and 
" performs a recursive invocation of the borrow goal. This causes an attempt to decrement in the 
thousands columns, which is blank* An impasse occurs. The answer shown in the figure is 
generated by assuming the impasse is repaired with Backup. This sends control back to the most 
recently invoked goal that has alternatives* When Backup is trying to decide v^hich goal to return 
to^ the active goals are 



In this core procedure, the Borrow-fnom-zero goal has no alternatives. It should always both Write a 
nine over the zero and Bonowfrom the next column* although here the writcnine step has been 
forgotten. The Borrow-from goal has ^Itematives because it has to chose between ordinary, non* 
zero borrowring and borrowing from zeros* Since Borrow^from was the most recently invoked goal 
that has altemativcs left. Backup returns to iL Execution resumes by taking its other alternative, the 
One that was not taken the first time* Hence, an attempt is made to do an ordinary Bonow^from, 
namely a decrement Crucially, this happens in the hundreds column, which has a zero in the top. 
The attempt to decrement zero causes a new impasse* We see that it is the hundreds column that 
was retumed to because the impasse was repaired by substituting an increment for the blocked 
decrement, causing the zero in the hundreds column to be changed to a one. 

The crucial fact is that Backup shifted the focus from the thousands columP. to the hundreds 
column, eveji though both the source and the destination of the backing up were Borrow*from, 
goals. This shift is predicted by the schema-inscancc architecture. It takes only a momem to show 
that the empirical adequacy of the register architecture is not so high* 



4*0*2 
- 13 9 

1 7 3 



Borrow -from 

Borrow -from-zcro 

Borrow 'from 

Borrow 

SublCol 

Multi 

Sub 



(the recursive invocation located at the thousands jolumn)- 
(at the hundreds coluijjn) 
(at the hundreds colunin) 
(at the tens column) 
(at the tens column) 
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a. 



\(i2 

39 



W2 
39 



S ince 1 can't take 9 from 2, Til borrow. The next column is 0, so 
1 1! decrement the 1, then add 10 to the 2. Now iVe got 12 take 
away 9» which is 3: 



since I can't take 3 from 0. Til borrow. The next digit is 0, 

I I 

but there isn't a digit after that! 



0 1 

\0 2 I guess I could quit, but rU,go back to see ifl can fix things up. 

39 Maybe I made a mistake in skipping over that 0, so Til go 

3 back there. 



1 

- 39 



When I go back there, Fm still stuck because I can't take 1 from 0. 
ril just add instead. 



W2 
39 
173 



Now I'm okay, I'll finish the bonow by adding 10 to the ten's 
column, and 3 from 10 is 7. The hundreds is easy, I just bring 
down the 1. Done! 



Figure ASMS 

Pseudo-protocol of a student performing a variant of 
the bug Borrow-Across-Zcro, 
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IL B. I One register per goal cati 7 generate the bug ^ \ 

Suppose each schematic goal has its own register Bornow-fram would have a register^ and ii 
would be set to the top digit or tlie thousands cqlu{i}n at the first impasse (episode b in figure 
A9-6). Hence^ if Bxkup returns to the first invocation of IJorrow-firom, the register will remain set 
at the thousands column. Hence. Backup doesn*t generate the observed btig of figure A'>-6. in 
fact, it can*t generate it at all: "Ihe onI> register focused on the hinidreds column is the one 
j3eIonging-lu~tllc_ Boiffii^jJiaov/CLQ. floal. lliat fi oal has no open alte rnatives, su B ackup can't r eturn 
to it Even if it did* it wouldn*t generate the bug of figure A9-6. So one register per goal is an 
architecture thai is not observationally adequate, 

//. B,2 One register per object doesn V generate the bug 

Assuming the registers are associated with object types fails for similar reasons. Both impasses 
(episodes b and c/) involve the same type of visual object, a digit* and hence the corresponding 
register would have to be reset explicitly by Backup in order to cause the observed focus shift. 

ILB,3 Smart Backup makes the theory too tailorable 

But providing Backup with ^n ability to explicitly reset registers would once again require it 
to do static analysis of control structure — an increase in power that should not be granted to 
repairs, 

IL R 4 Duplicate borrow-from goals 

One could object that we have made a tacit assumption that it is the same (schematic) 
BorTOw-from that is called both times, ff there were two schematic Borrowfroms, one for an 
adjacent borrow, and one for a borrow two columns away fh}m the column originating the borrow, 
then they could have separate registers. This would allow Backup to be trivial once more. 
However, this argument entails either that one have a subtraction procedure of infinite size, or that 
there be some limit on the number of columns away from the originating column that the 
procedure can handle during borrowing. Both conclusions are implausible* 

ILB.5 Duplicate borrow ff>als 

One could object that there is another way to salvage the multiple register architecture. 
Suppose that the schematic procedure Is extended by duplicating Borrow goals (plus registers) as 
needed. The bug could be generated, but this amounts either to a disguised version of schemata 
and instantiations, or an appeal to some powerful problem solver (which then has to be explained 
lest the theory lapse into infinite tailorability). So, this alternative is not really tenable either* 

Conclusion 

This rather lengthy argument concludes ^ith the schema*instance architecture the only one 
left standing. What this means is that representations that do not employ the schemata and 
instances^ such as Hnite slate machines with registers or flow charts, cau be dropped from 
consideration. This puts as, roughly speaking, on the familiar ground of "modem** r.Tsresentatron 
languages for procedures, such as stack-based language^ certain varieties of production systems, 
certain messagcpassing languages, and so on, 
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Appendix 10 
Satisfaction Conditions 



Tliis appendix presents iirguments eoneeming what types the procedure representation 
Itinguage should allu^v for goals. It eontinues a line of argument begun in chapter 10. It contrasts 
two hypotheses: 

1. And^Or: Goals have a binary type. If a goaVs type is and, all applicable sut^oals are 
executed before the goal is popped. If it is or, the goal pops as soon as one subgoal is 
executed. 

2. Satisfaetion conditions: Goals have a condition which is tested after each subgoal is executed. 
If the eondition is true> the goal is popped. Metaphorically speaking, the goal keeps trying 
different subgoals until it is satisfied. 

Satisfaction conditions were used in the version of repair theory presented in Brown 'and 
VdnLehn (1980). First the arguments in their favor will be presented (some of ihes^ are repeated 
from chapter 10 so that the appendix will be relatively self contained), llien arguments against 
them vvill be presented, and shown to be slightly stronger. Hie arguments concern how to constrain 
tlie deletion operator in such a way that it will continue to generate the deletion bugs (see chapter 
7) but it will not generate certain star bugs. It has already been shown (in section 10.3) that 
limiting the operator to delete only and rules allows it to generate all the deletion bugs while 
preventing it from generating many star bugs. However, it still aWms a few star bugs to be 
generated. These star bugs will be examined in detail in order to motivate a way of blocking their 
deletion. 

AlO.l Satisfaetion conditions hloek certain star bugs 

Suppose that the main loop of subtraction, which traverses columns, has the following goal 
structure vvhen it is expressed using And-Or goal types. (The following is a translation of the goal 
structure of figure Ky-l into an And-Or representation. The rule numbers are the same,) 

Goal: Multi (C) Type: and 

3. (SublColC) , 

4. (SubRest(Next-eolumn C) 

Goal: SubRest(C)Type: or 

5. C is not the leftmost column =^ (Multi C) 

6. Cs bottom is blank ^ (Show C) 
7- true^(DiffC) 

The applicability conditions Tor the and rules have been omitted. The applicability conditions for 
the OR goal, SubRest, reflect the fact that they are tested in order and only one is executed. For 
instance, if the column C is the leftmost column, then rule 5 will not be applicable, and control 
moves on to test rule 6. If that rule applies, the primitive Show answers the. column, then control 
returns to SubRest. Since die goal is marked as an OR goal, and one rule has been excculed, no 
more rules are tested. In particular, the default rule, 7, will not be tested. Instead, the SubRest 
goal is popped 
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The Muki goal is an AND goal, so cither of its subgoalo can be deleccd. Deleting rule 4 
creates a bug that only does the units column. Intuitively, only doing one column would be the 
mark of a student who has not yet been taught how to do multiple columns. Since doing multiple 
columns is alwa>s taught before borrowing, it would be highly untilcely for a student to Icnow all 
about borrowing and yd do only the units column. Hence, if all of BF/ (BFZ abbreviates "borrow 
ftom zero") were present sviien rule 4 is deleted, che procedure would generate a star bug. The 
work of this star bug appears below: 

3 2 9 

Only-Do^Units; 3 4 5 3 4*6 3^0*7 

-10 2 -12 9 -169 
3 X 6 X 8 X 

If boncwing were not yet learned and rule 4 were deleted, then reasonable bugs would be 
generated. Kor instance, one reasonable, but as yet unobserved bug docs only the units column but 
it simply takes the absolute difference there instead of borrowing. This would be the bug set 
{Only-Do-Units Smaller- From-Larger}. In short, there is nothing wrong with the deletion of rule. 4 
per s€, but it can create a procedure that mixes competence with incompetence in an unlikely 
manner. 

Another star bug occurs vhen rule 13 is deleted from Borrow, give; tF.e following version of 
column processing and borrowing (this is also a translation of figure 10-1): 

Goal; SublCol (Q Type: OR 

8, T<B in C => (Borrow C) 

9, the bottom of C is blank => (Show C) 

10, true=>(DiffC) 

Goai: Borrow (C) Type: AND 

IL (Borrow-from (Next-column Q) 

12, (AddlOC) 

13. (Diff Q 

Deleting rule 13 generates a procedure that sets up to take the column difTerence after a borrow^ 
but forgets to aetually take it This leads to the following star bug: 

3 2 9 

*Dlank-Wi(h'Bcrrow; 3 4 6 3 4^6 3 0^7 

-10 2 -12 9 zXAl 

2 4 3 V 2 1 X 1 3 X 

What makes this bug so unlikely is that it leaves a blank in the answer despite the fact that it shows 
a sophisticated knowledge of borrowing. 

It is perhaps possible to put explicit constraints on conjunctive rule deletion in order to block 
the deletion that generate the star bugs^ However, there is a second way to prevent overgeneration 
that will be shown to have some advantages. The basic idea is to make the operator inapplicable by 
changing the types of the two goals in question so that they are not AND goals. This would make 
the deletion operator inapplicable. That is, one changes the knowledge representaUon rather than 
the operator 

The proposed change is to adopt a new exit convention, the third one mentioned in the 
introduction. The exit convention generalises the binary AND/OR type to become satisfaction 
conditions. The basic idea of an AND goal is to pop when all subgoals have been executed, while an 
OR goal pops when one subgoal has been executed. The idea of satisfaction conditions is to have a 
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goal pop when its satisfaction condition is true, Subgoals of a goal arc executed until either Ihe 
soars saiisfeciion condiliun becomes true* or all ihe applicable subgoals have been tried. (Note ihal 
ih's is not an iteration construct — an "until" loop — since a nile can only be executed once.) AND 
gocils become goals with Jalsl satisraccion conditions: Since subgoals arc executed until the 
sausfciction Londition bccumcs tnie (^^liich it never does for the and) or all tlie subgoals have been 
tried, giving a goal I ALSL <is its satisfaction condition me^ns that it v^tll alv^ays execute all i[s 
subgoals. Conversely, Oit goats arc gi^en the satisfaction condition J'RU:: The goal exits after just ^ 
one subgoal is executed. 

With this construction in the knowledge representation language, one is free to represent 
borrowing in the following way: 



Goal: SublCot (C) Satisfaction condition: Cs answer is non-blank: 

8. T<llinC=i^(UorrowC) 

9. the bottom of C is blanlc ^ (Show C) 

10. tnie==>(DirFC) 

God\: Borrow (C) Satisfaction condition: false 

11. (Borrow-from (Next-column C)) 

12. (AddlOQ 



The AN!) goal. Borrow^ now consists of two subgoals. After tlie> arc both e^ecuteil* control returns 
to SublCol. Because SublCoVs satisfaction condition is not >ei true — the column's answer is still 
blank — another subgoal is tried. DifF is chosen and executed, which fills in the column answer. 
Now the satisfaction condition is true, so the goal pops. 

Given this encoding of borrowing, the conjunctiive nile deletion operator does exactly the , 
right thing when applied to borrow, fn partieulan since rule 13 is no longer present, it is no longer 
possible to generate the star bug, *Blank-wiih-llorrow-From by deleting it Rule 13 has been 
mei^ged, so to speak, with rule 10. Since rule 10 is under a non-AND goal, SublCol, it is protected 
from deletion. 

Similarly, the star bug associated with column traversal can be avoided by restructuring the 
loop acroE;^ columns. The two goals Multi and SubRcst art replaced by a single goal: 



This goal first processes the given column by calling the main column processing goal, SublCol 
Then i^ cheeks the satisfeetion condition. If the column is the problem's leftmost column, the goal 
pops. Othenvise, it calls itself recursively* By using a satisfaction condition formulation, generation 
of the star bug is avoided. The AND goal. Multi. has eliminated aloRg with its rule 3, the rule 
whose deletion caused the star bug. 

These two illustrations indic^Ue that augmenting the representation with satisfaction condition 
creates an empirically adequate trcatmei.* (j'^ deletion* Satisfaction condition were used for several 
years in Siem (Brown & VanLehn, 1980; VanLchn, 1983). They also play a crucial role in the 
formulation empirically adequate critics. A critic is a kind of impasse condition. Critics serve 
both to trigger repairs <ind to filter repairs, ft turns out that one of the problems with critics can be 
solved using satisfaction conditions. In considering this problcjn, a different approach was 
discovered to deletion and deletion blocking that led to the position currently taken by the theory. 



Goal: Sub All (C) Satisfaction condition: C is the leftmost column. 
5* XXViQ^ (SublCol C) 
6* true (SubAll (Next-colunin C)) 
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AIM llic blank answer critic problem 

One of the unsolved problems mentioned in Brown atid VanLchn (1980) is called the blank 
answer critic problem. Oue function of critics is to filter out repairs tl^at arc applicable to the given 
iinpasse whenever they act in such a way as to immediately violate a critic 'iliere is evidence that 
there is a critic* called the blank answer critic, that objects to answers that have blanks in them (i.e., 
a number that looks like "34 5" as an atiswer). 'rhe evidence for this critic involves a certain mle 
deletion that causes local problem solving. 

In the reformulated version of Ilorrow, there are two rules. One uoes borrowing-into, the 
other does borrowing- from. If the mle that docs borrowing-into is deleted, then the AddlO 
operation of borrowing is skipped, lliis means that the column dirfcrcnce operation tliat follows it 
will be confronted with a column that is still in its original T<B form, lliis causes an impasse. 
Two dirfercnt repairs cause the following two bugs to be generated: 

Smaller- From- Larger- With- Borrow: 



Zcro-A fte^B0JT0 w: 



3 4 5 


3 

3 4 5 


2 9 
3I0 7 


-10 2 


- 1 2 9 


- 1 6 9 


14 3V 


2 1 4 X 


1 3 2 X 


3 4 5 


3 

3 4 5 


2 9 
3^0 7 


-10 2 


- 1 2 9 


- 1 6 9' 


14 3V 


2 1 0 X 


1 3 0 X 



The existence of these two bugs establishes the impasse occurs. However, if this impasse is repaired 
with the Noop repair, a repair that causes the stuck operation to be skipped, then the following star 
bug is generated: 

3 2 9 

♦Blank-With*BorTOw: 345 345 3^0 7 

-10 2 -12 9 -16 9 

14 3V 2 1 X 1 3 X 

After a borrow-from, control returns to the column which initiated the borrow. That column is still 
in its original state. When the DifT operation comes to perform the column difTcrence, it hits an 
impasse because it can't take a larger number from a smaller one. The Noop repair causes the 
stuck operation, in this pase Diff, to simply be skipped. Hence, the column is left unanswered* 
creating the pattern of answers shown above. This pattern makes the bug a star bug, since such 
gapped answers are totally unlikely given the sophistication in l)orrowins. 

The blank answer critic is supposes to block this bug by sensing that the Noop repair will 
leave a gap in the answer The problem is that all the other kpown critic^ in subtraction arc 
preconditions to primitive actions. That is, they are tested just before an action. They guard 
s^ainst errors of commission. However, the blank answer critic cannot be implemented as a 
prccondiiion. The basic problem is that is must sense an error of omission rather than commission. 
At the time the Noop repair is being considered, the offending gap is on the left end of the answer 
(i.e., the answer string looks like " 34" with the blank on the left). U does not look like a gap yet 
If the blank answer critic is a precondition, it jnust be very smaa It must be able to read the 
control structure of the procedure in order to tell that what is now a harmless boundary between 
digits and blanks will become a gap. In shoru blocking the Noop repair with a blank answer critic 
forces tlte model to use a very powerful model of critics. If any other way can be found to block 
the star bug, this degree of freedom need not be added to the theory* 
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There IS an aspect of gaps in ihe answers which reveals a way to solve ihe blank answer critic 
problem. It turns out thai there is a bug that does leave blanks in the answer: 

JllanJt-JnsteadoMJorrow: 346 346 207 

-10 2 -12 9 -16 9 

I 4 3 V 2 2 X 1 X 

11iis bug is generated b> assuming that borrowing hasn't been •c-arned yet. When a column is 
prtKChsed Uiat nurmjlli \^ouId require borrowing, an impasse occurs since a larger number can't be 
t*iken frunj d smaller one. Repairing with Moop leaves the answers to such columns blank. The 
occurrence of tliis bug juxtaposed with the non -occurrence of *lJlank-Witli-Borrow shows that 
whatever is preventing gaps in the answers must have been acquired In particular, it must have 
been acqinred son.ctime bct\^een the st<iges of learning representee* by the two core procedures. It 
must have been acquired sometime after borrowing was learned since Blank-lnstead^of-Borrow 
hasn't learned a6out borrowing and it also hasn't learned about blanks in the answer. 

Satisfaction conditions provide the hoQfc that solves the mystery. Instead of a critic that is 
sen^iti\c to blanks* one postulates that a repair is filtered out if it causes a goa} fo exit unsafisfied* 
More ciccurjtely. one pustulates that there is an impasse condition that is true if (1) a goal pops due 
to the fact that all tlie applicable subgoals have been tried, and (2) tlial goal has a non-trivial 
satisfaction condition that is false. ^ITial is* the goal exits unsatisfied. The intuition behind this 
impasse condition is that if a goal has spa^ific information that indicates when it is satisfied, and an 
aucmpt is m^ide to exit it without tiiese conditions being true, then it is obvious that something is 
wrong, and this triggers local problem solving. 

In this Case, gaps in the answer are prevented because the main column processing goal, 
SublCol, has a satisfaction condition that tests whether its answer is blank* It is satisfied only if the 
answer is blank. When the local problem solver is considering which repair to chose, it discovers 
that the Noop repair will leave a blank in the answer causing an attempt to exit SublCol .with the 
answer left blank. Since the satisfaction condition Is false, the new impasse condition is true* ITils 
means the Noop repair must be filtered since repairs are chosen only if they get the interpreter 
unstuck (i,e*, no impasse condition is true)* 

The explanation of the acquisition phenomenon rests on assuming that the student who knows 
about borrowing has a non-trivial satisfaction condition while the student who has not yet learned 
borrowing has a trivial one* That is^ the student who has not learned borrowing has a simple OR* 
type satisfaction condition for the goal that processes a column: 

Goal". SublCbl (C) Satisfaction condition: true 
1* Cs bottom is blank ^ (Show C) 
2* true ^ (DifFC) 

whereas the student who has learned borrowing has the non^trivlal satisfaction condition version of 
SublCol: 

Goal: SublCbl (C) Satisfaction condition: Cs answer is non-blank 

1. T<B ^ (Borrow C) 

2. C% bottom is blank (Show C) 
3* true ^ (DifFC) 

Goal: Borrow (C) Satisfaction coniiition: felse 

1. (AddlOC) 

2. (Borrow-from (Nexl-column Q) ^ * - 
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In ihc first case, where borrowing has not been tauglit and ihc second occurrence of DifF as 
following borrowing hds not been sccn^ ihcrc isn't enough evidence to separate the simple OR type 
ftom a non-trivial satisfaetion condition. H?nce. the learner conservatively takes the simpler 
Sdtisfdction condition. In the second case, borrowing has rc\caled a becond occurrence of DifF. The 
student infers that the common goal of all the different ways to process a column is to make the 
answer be non-blank. I*his moti\ates the inclusion of die non-trivial satisfaction eondition. This 
acquibitional story aceoimts for the fjct that pre-borrow students aWov* blanks in their answers 
because tliey don't h*ivc the non-trivial satisfaction condition yet, and the post-borrow students 
block repairs that leave blanks because \hey do have the nOn-trivial satisfaction condition. 

^ To summari/A once satisfaetion eonditions are permitted, they ean be used two ways: (1) to 
blixrk application of a deletion operator, and (2) to block the repairs that would Icav^e satisfaetion 
conditions unsatisfied. ^orei)\er, one can account for the acquisition of the blank (Answer critie by 
the some^^liat more natural acquisition of the satisfaction condition, liiere arc a number of good 
things about this account, but there are also some ^vere flaws^ 

A10.2 Problems with satisfaction conditions . 

The first flaw lies in the "conservative" acquisition of satisfiiction conditions. When 
sitisfaetion condition acquisition is spelled out in a little more detail, it is found to make wrong 
predictions. The basic idea is that a satisfaction eondition is acquired when one learns a setup or 
preparation stcp^ for some main step. That is> if M is a known aetion and. one learns that a certain 
sequenee <X Y Z M> is an alternative to M, then one infers that M is a main step and that the 
first part of the sequence, <X if Z>, is piepardtion^for the main step M. For instanee, the main 
step of column processing is DifF and the preparations steps are borrowing-into and borrowing- 
from. When such a preparation subprocedure is taught, the learner sees a second setting for, the 
mam step. DifT. This new setting allows the learner to induce, what is eommon about the two 
occurrences and abstract it into a satisfaction condition. The attachment of the new material is done 
with a satisfaction eondition. In the case .of <X Y Z M>, the resulting goal structure would be; 

Goal; G Satisfaction condition; SC 

1. AC => P 

2. true => M 

Goal; P Satisfaction condition; False 

1. X 

2. Y 

3. Z 

where SC is some condition achieved by M, and AC is some condition indtcatiog that the new 
preparation goal P is needed. This is a fine story .in that it ties in well with the step schemata 
account of learning (see section 19.2). 

However, it makes wrong predictions. Just as it predicts that regular borrowing will be 
learned th(s way, yielding the all important satisfaction condition of SublCol, it also predicts that 
borrowing from zero will be learned this way. Telcologioally, borrowing is just as much a setup in 
One case as in the other. The textbooks even take pains point tliis out. So the prediction is that 
the following structure would be required for BFZ; 
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Goal: Borrow-from (C) ^atisfaaion condition: C s top digit is decremented 
h T=OinC=>(BhZC) 
1 true ^ (Decrement-top C) 

Goal: BF/(C) Satisfaction condition: false 
L (AddlOC) 

2. (Borrow-from (Next-column Q) 



There Is a^lisfaction condition on the Borrow;from goaI» and tlierc is no decrement action under 
BKZ, Because the decrement action is been discovered to be a main step, it occurs only once» 
under Borrow-from. ITiis structure is isomorphic to the structure used for borrowing, whose main 
step is Dirr. 

Given this decomposition, rule deletion can*t generate the deletion bug DonVDcerement- 
Zero. To do sa it needs to delete the decrement rule whidi occurs after BFZ*s AddlO action (rule 
18 in figure but to leave the dccrcment rule of simple borrowing ^done. Under goal structure 
given above, those *wo decrement rules are the same. Rule deletion can*t ddete just one> So> one 
of the ihrcc crucial deletion bugs can*t be generated. If it is to be generated, then satisfaction 
condition acquisition must not induce a satisfaction condition for Borrow-from when it learns BFZ. 
It must instead use an ok-goal structure, similar to the one used in Hgufe 10-L But why is a 
satisfaction condition learned for SublCoI but not for Borrow-from? If the acquisitional account 
that solves the blank answer critic problem is going to stand up> then it must explain why a 
satisfaction condition, is acquired for one goal but r^^t the other 

A103 Blank answer blocking: the acQuisitEonal timing ts wrong 

' The acquisitional account given above makes the prediction that the SublCol satisfaction, 
condition will be acquired when simpK nOn-zero borrowing is learned. Hence> after students 
learned ample borrowing* they should no longer leave blanks 'ji their answers- There is some data 
contradicting this prediction. Figure AlO-l reproduces a test. taken by "a third grader, student 3 of 
class 2. Except for thcee test items, the student answers as if he had a coihpound of tJiree bugs (i.e., 
0-N = N, Borrow-Once-Then-Smaller*From -Larger, and Smaller-Frorn-Larger-lnstead-of-Borrow* 
From-ZeroX One of these three test items is the datum that is important here. It is the very first 
BFZ problem, problem e From the scratch marks innho tens column, it is apparent that th^ 
student attempts to decrement tlie zero and hit an impasse. He apparently repairs using the Backup 
repair. He resumes execution by trying to process the units eolumn and discovers Uiat he can't 
because the column is still in its original T<B fbrm. This is a second impa^.^. He repairs with the 
Noop repair* causing him to essentially give up on tile units column and go on to the next column. 
The rest of the problem he answers in his usual way, which includes the bug 0-N=N (c>f, the 
tens column). His performance of the units column is characteristic of a bug calletl Blank-Instead- 
cf- Borro w-From-Zero- 

Problem e is the only evidence (so for) that this bug exists. However* the analysis of problem 
e is supported by this studenfs^rformance on all the other BFZ problems on the tesL Tlie other 
BFZ problems are answered with the bug Smaller*frcm-La[ger*Instead-of-Borrow*From-Zero 
(problems m and n are evidence for the bug; problems / o and rare answered by 0-N=NX This 
bug is generated by following the same course that the derivation of Blank-Instead-of-Borrow-From- 
Zero followed, except that the second impasse, the T<B impasse in the units column, is repaired by 
Refocus instead of Noop. Thait is, the two bugs are in the same bug migration class. They come 
from the same core procedure. It seems quite clear that this student knows about simple, nOn-zero 
borrowing. But th&lirst time he encounters a BFZ problem, he impasses and repairs in a way 
characterized by Blank- Instcad-of-Borrow-From-Zero, The other BFZ problems 
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Subtraction Tect 

Nfm c Grade 

Teacher Datc_ 

31S 0^2 182 



■.^3 b. 80 c. Htl d. 183 e. 10J6 
7 ■ 24 ■ 83 ■ 95 ■ 38 
36 64 144 193 ^ 13 



f. 800 


g. 5:rl^ 


Oil 

h. Att 


i. 654 


■ 1 68 . 


■ 268 


■ 215 


- 204 


768 


365 


216 


450 








414 


539T 


k. 2487 


m. 3005 


n. 8^4 


- 2697 


5 


28 


- 247 


3314 


2482 


3023 


607 






014 


013 


0.700 p. 


608 q. 


3or# 


1 OO^^g 


5 


■ 209 


206 


318 


705 


401 


3208 


10315 



Figure AlO-l 
A test showing Blank-Instcad-of-BorrowFrom-Zcro 
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are repaired slightly dificrcntly. This is evidence that Blank- Instead' of-Borrow-From-Zcro exists. 
Since there is only this one problem to support the existence, or the bug, it could perhaps be 
dismissed as a fluke. Perhaps the student got rattled by the impasse and temporarily abandoned his 
knowledge of the procedure. Uut he is not so rattled that he fails to finish the problem, so it is 
equally plausible that IhisT'^s a true bug migration. 

The existence of this bug contradicts the conjecture that students will stop leaving blanks in 
the <inswer v^hen they learn simple borrowing. Ijere is a student that has learned simple borrowing, 
and >et he leaves ^^a blank in the answer. The prediction still stands that by the time students have 
learned BKZ, they will stop leaving blanks in the answer (c,f., the eariier discussion of *Blank-With' 
Borrow). However, it appears that linking the acquisition of blank-blocking with borrowing is a 
little premature. This ca\ts doubt on the whol/. satisfaction condition framework for blocking the 
star bugs that generate blanks. 

MQA Ill-fonneil answers ^ 

Two arguments were given against using satisfaction conditions to block the sta** t^ugs that 
generate gaps in answers. One was based on tjie fact that SublCol needs to have a satisfaction 
condition but Borrow-H^m needs not have one. Any account of how satisfaction conditions arc 
acquired would have to explain why one goal and not the other acquire.^; a satisfaetion condition. 
The second argument indicated that the xquis:tion of the blank-blocking subskilt might not occur 
at tile time the appropriate satisfaction conditions are acquired. 

A totally difTeren: approach to the blank answer critic problem is to focus on the notation 
rather than the procedure. '\lic basic idea is that it is not the fact that SublCol wants to answer 
columns that prevents blanks, but the fact that answers must have a certain syntax, and that syntax 
excludes blanks in the middle of numbers. This solves the mystery of why some goals seem to 
acquire satisfaction conditions and others don*L SublCol seems to have a satisfaction condition 
because answer blanks are blocked by knowledge that they make the answer ill-formed. That is, 
*Blank-with'Borrow is blocked because it produces syntactically ill-formed notation. On the other 
hand, Borrow^froin seems not to have a satisfaction condition because there is nothing ill-formed 
about a column that lacks a decrement Hence, the bug Don't-Dccrement-Zero is not blocked, 
because it generates only syntactically correct notation. So the general idea is that no goal has a 
satisfaction condition. What appeared to be satisfaction conditions was just syntactic knowledge 
being applied somehow to block bugs. 

Using knowledge of notatonal syntax to block the star bugs also solves the mystery involvir^g 
the timing of acquisition. The acquisition of knowledge about the notation would be decoupled 
from the acquisition of the procedure per se. Hence, there would be nothing unusual about a 
student, such as the third grader mentioned above* who knew how to do non-zero borrowing but 
did not know to filter repairs that leave blanks in the answer. The acquisition of borrowing 
apparently occurred before the knowledge of the ill-formedness of gapped answers. 

The presence of such notational knowledge is mueh clearer in algebra than in subtraction. It's 
a widely accepted empirical generalization thai algebra students almost always produce syntactically 
well-formed answers. The answers might be wrong, but they arc syntactically correct. Carry et al. 
(1978) present hundreds of error types, and all are syntaetically well formed. Indeed, Carry et al. 
assume that students impc>sc syntactical well-formedness on thei: answers in ordqr to explain several 
classes of errors, ^^r instance* they propose a general deletion transfonriation that exeises 
subexpressions from algebraic expressions. A common example involves cancelling, as in 
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X 1 
3+x 3 

The two instances of the variable have been cancelled Carry ei al note that if such deletions were 
taken Hieraly, they would leave syntactically malformed expressions: 

X 

3+x 3+ 

Subjects doni do this. They fill in blanks with zero or one, and they delete extra operator signs. 
Apparently, they do this in order to make the output expression syntaetically well formed. This 
adherence to well-formedness applies to intermediate expression's as well as to the final expression. 
In particular, when the task is to solve an equation, one sees line after line of well-formed equations 
produced. Apparently, studenis will repair syntactic malformations of intermediate expressions 
before going on to the next transformation. They don't wait until the end to cheek the syntax. The 
same is true of blanks in subtraetion answers. The studenis don't wait until the end to repair a 
blank* they fill it right away. 

This solution to the problem of generating star bugs is not well formulated yet In particular, 
nothing implementing it exists in Sierra at this time. However it is a more promising directioa 
thai^ using satisfaction conditions to achieve the same blocking. 

Taking the syntactic approach removes the motivation for satisfaction conditions. There is no 
reason to use the more powerful goal type now. To keep satisf^tion conditions in the 
representation anyway is possible, but creates the problem that the leamer must be equipped to 
learn satisfaetion conditions. As indicated earlier, satisfaction condition acquisition Is problematie. 
Since there is no motivation for satisfaction conditions and their presence ia the representation 
creates extra difRculties fbr the learning theory, the satisfaetion conditioa position will be 
abandonoi. The goal type convention for the representation will be the And^Or convention, which 
is simpler and weaker than the satisfar^tion conditions anywc^y. 
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